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CHAPTER tf 


Perspective 


Advances in integrated circuit technology have had a revolutionary impact on computer 
system design. A chip today integrates far greater sophistication and computing power than 
ever before. Fabrication processes have progressed rapidly so that chips with onc million com- 
ponents are a reality, and enthusiasts predict chips with upto one hundred million components 
within a decade. Indeed, it is expected that if ion heam etching technianes hacame viahle 
for “printing” chips directly, then minimum feature sizes would drop by a factor of ten, thus 


allowing a hundred-fold increase in the number of components on a chip. 


More significantly, the new technology encourages custom design of special purpose in- 
tegrated systems for solving very large scale sophisticated problems. No longer is it necessary 
to use a single conventional architecture for solving diverse problems. Instead, the computa- 
tional structure of a problem may be mapped directly into hardware. This has shifted the 
emphasis from searching for algorithms, necessarily convoluted to suit a given architecture, to 


efficient hardware design suited to individual problems. 


While this emphasis on greater design flexibility has opened up new directions in comput- 
ing, a number of difficult problems must be addressed before the emerging technologies can be 
effectively exploited. Probably the most significant development in casing the awesome task 
of designing and implementing large systems has been the standardization of design rules and 
the widespread use of standard building blocks. The design methodology expounded by Mead 
and Conway [55], and the development of building blocks such as gate-arrays, PLA’s, and 
ROM’s has helped shift the emphasis in circuit design from the exclusive domain of clectronics 


to a higher, more functional level, where aspects of circuit layout may be treated in purely 
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geometrical terms. 

This thesis examines various aspects of the circuit layout problem. We address questions 
such as: why is circuit layout difficult, what properties of a circuit critically determine the 
quality of its layout, and what kinds of heuristics can help solve layout problems efficiently? 
These questions are motivated by the need for gencral techniques for laying out very large 
circuits. Such basic issues must be addressed before building any automatic or computer-aided 
design and layout system. 

Although the circuit layout problem is not new, progress has been painfully slow. The 
proliferation of diverse technologies and concerns has only exacerbated the layout problem. 
On the one hand we desire to minimize layout area, signal delays, and power dissipation, while 
on the other hand we need to increase reliability by increased redundancy. In addition we 
require that custom circuits be assembled using standard configurable or restructurable chips 
as building blocks. It is not at all clear whether these different requirements are compatible or 
necessarily contradictory. 

Part I presents a general theory for VLSI graph layout. Not only does the theory identify 
structural properties of circuits that critically determine the quality of layouts, but also provides 
techniques for solving various layout problems. Perhaps the most significant result that emerges 
is a general framework for solving diverse problems in a simple and uniform manner. In 
particular, the unified framework provides a layout technique which is suitable for custom 
layout, and at the same time is efHicient with regard to area, delay, and fault-tolerance. Part I 
consists of Chapters 2 through 5. 

Part IJ examines the channel routing problem. Algorithms for channel routing form the 
basis of many existing automatic layout systems. Although this problem has reccived wide 
attention over the last decade and a number of heuristic algorithms have been proposed, none 
of these is guaranteed to always determine efficient routings. Approaching this problem from 
a theoretical viewpoint, we characterize completely the properties that make channel routing 
difficull. Moreover, we provide a novel, linear-time algorithm that is always guaranteed to find 


near-optimal solutions. Chapters 6 and 7 constitute Part H of this thesis. 
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Although the two parts of the thesis investigate different problems, they share a common 
underlying philosophy. We begin with a theoretical characterization of the propertics that make 
the problems difficult. In the next step, algorithmic techniques are developed for exploiting 
these propertics to solve the problems, Although the results in their present form are primarily 
theoretical in nature, the techniques provide new insights and approaches for VLSI layout. It 
is likely that some of the techniques can be adapted for use in practice. 

The remainder of this chapter discusses the two parts of the thesis in more detail, and 


concludes with an outline of the thesis. 


1.1. The Complexity of VLSI Graph Layout 


In recent years a number of interconnection networks have been proposed for solving diverse 
problems. For example, one- and two-dimensional arrays of processors are naturally suited to 
vector and matrix computations [50]. Binary trees are particularly attractive because of their 
logarithmic depth and have been proposed for a variety of applications including raster graphics 
[27], databases [75], and direct execution of applicative programming languages [54]. The 
mesh of trees [19, 44, 57] combines arrays and trees in an clegant manner. By virtue of their 
sophisticated structure, networks such as the shuffle-exchange network [73], cubc-connected 
cycles network [63], and fast-fouricr transform network [76], in which recursive algorithms 
are programmed conveniently in a natural manner, are computationally more versatile and 
powerful than the simpler array structures. 

Can we exploit the power of sophisticated networks in VLSI? This question becomes 
increasingly important as problem sizes, and the number of processors increase. It might 
be relatively simple to fit a thousand processor array on one chip, but can we fit a thousand 
processor shuflle-exchange network on one chip? Moreover, even if the shuffle-exchange network 
fits, will its performance, determined by the clockperiod or longest delay, be comparable to the 
array? ‘lo answer such questions, and to compare the relative merits of different networks, it 
is necessary to develop a general theory for VLSI graph layout. 


Research in layout theory was initiated by Thompson [79, 80] who proposed a formal model 
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for VLSI graph layout and investigated area-time tradeoffs for computing certain functions. 
Using information-transfer arguments, he obtained strong lower bounds on the layout areas of 
graphs such as the shuflle-exchange and cube-connected cycles graphs. Subsequently, Leiserson 
[49, 50] and Valiant. [83], focussing on the problem of minimizing layout area, independently 
developed a divide-and-conquer layout strategy for general classes of graphs. Using elegant 
combinatorial arguments, Leighton [40, 41] showed that the bounds of Leiserson and Valiant 
were the best possible in that cach class contained graphs for which the bounds were, upto 


constant factors, optimal. For some graphs however, the bounds were very weak. 


Layout area is not the only considcration in choosing one layout over a multitude of 
possible layouts. In practice, we desire to fabricate small, inezpensive, and easily testable chips 
which compute quickly and reliably. A large number of important enginecring issues need to 
be considered in fulfilling these (possibly conflicting) requirements. 

Propagation delays across long wires critically affect the performance of a circuit layout. In 
pipelined or systolic systems, long delays deterinine the clockpceriod and overall performance of 
the system. Since propagation delay can be reduced by decreasing wire length, it is important to 
make the longest wire in the layout as short as possible. Another way to reduce the propagation 
delay across a long wire is by increasing the size of the transistor that drives the wire; by 
carefully adjusting transistor sizes to match wire lengths, the clockperiod can be dramatically 
reduced. Since wire delays determine the efficiency of a chip, it is imperative that techniques 
to minimize delay be developed within a general theory for VLSI layout. 

Fault tolerance is another important design consideration. Fabrication processes are prone 
to errors so that every wafer invariably contains a small number of defects. Even if a wafer 
contains a number of defective processors, it may still be possible to use the wafer by configuring 
wires around the defective processors. This may, for example, be performed by laser restruc- 
turing techniques [64]. This ability to wire together processors sclectively has considerable 
impact on sytem design. [for example, how should a thousand processor wafer be designed so 
that a two-dimensional array can be realized using all the good processors, no matter how the 
defective processors are distributed? 


Another major concern is the problem of assembling large systems. Researchers have 
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proposed networks with as many as one million processing elements [54]. Such systems are 
clearly too large to fit on a single chip. Whenever any system is larger than a single chip, 
it is necessary to partition the system among several chips which can be assembled at the 
printed circuit (or chip carrier) level. What is the most effective way to partition a large 
system among several chips? This question is pressing because although fabrication technology 
has been advancing at a rapid pace, the technology for packaging chips has been crawling in 
comparison: current projections indicate as many as one hundred million components per chip 


but not more than two hundred off-chip pin connections. 


The economics of fabrication technology dictates that it is expensive to make one chip, 
but cheap to make many copies. For this reason, manufacturers of custom chips have been 
encouraged to make configurable designs such as gate-arrays, ROM’s, and PLA’s. The entire 
chip is manufactured, except for one mask. Given a desired configuration of the chip, a 
final layer of metallization connects up the circuitry in that way. Most of the design and 
fabrication costs are thus factored over several chips. Similarly, restructuring techniques allow 
a chip to be modified after fabrication. For example, “diode-busting” is used to configure 
PROM’s (programmable read only memory) after fabrication. More recent and exciting is the 
prospect of “laser welding” by which connections between wires can be either made or broken 
after fabrication by high-intensity laser beams. Such techniques further encourage configurable 
design of VLSI chips. Thus, we are led to consider how to design efficient layouts which may 


be configured to realize, for example, arbitrary binary trees or arbitrary rectangular arrays. 


Motivated by the engineering issues outlined above, Part I develops a general framework for 
VLSI graph layout. Within this framework all the diverse concerns mentioned above are dealt 
with in an efficient and uniform manner. The framework is based on a divide-and-conquer 
strategy for graph layout which differs significantly from the divide-and-conquer strategy of 
Leiserson [49, 50] and Valiant [83]. The improved strategy is based on the notion of graph 
bifurcators introduced by Leighton [42], and provides universally close bounds on important 
cost functions such as layout area and propagation delay. The results of Part I are based on 
the papers of Bhatt and Leiserson (8, 9], and Leighton [42]. In addition, the results of Chapters 


4 and 5 appear in [7]. 
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1.2. The Complexity of Channel Routing 


Although the graph layout problems considered in Part I provide new insights and paradigms 
for VLSI layout, they are nonetheless abstractions of layout problems encountered in practice. 


Part II focuses on a specific problem confronting current automatic layout systems. 


Channel routing plays a central role in automated layout systems. Most layout systems 
proceed by first placing modules on a chip, and then wiring together terminals on different 
modules that should be elcctrically connected. To solve the latter wiring problem, the chip 
is heuristically partitioned into a set of rectangular channels, and cach channel is assigned a 
set of wires which are lo pass through it. This effectively reduces a difficult “global” wiring 


problem to a set of disjoint (and presumably easier), “local” channel routing subproblems. 


An instance of the channel routing problem is specified by a set of terminals located at 
facd positions on two Nurizontal tracks. Mach set of terminals with tne same iabel constitutes 
a net which must be electrically connected by wires running in horizontal tracks and vertical 
columns. Figure 1.1 shows a channel with six nets. Horizontal and vertical wire segments are 
placed on two different layers of interconnect. The objective is to wire up all nets in a way 
that minimizes the channel width, which is the number of horizontal tracks used for wiring. 


For example, Figure 1.2 shows a minimum width wiring of the channel in Figure 1.1. 


The channel routing problem has been intensively studied for over a decade, and many 
heuristic algorithms have been proposed for solving the problem [1, 2, 11, 12, 18, 20, 21, 34, 35, 
36, 38, 51, 60, 62, 67, 68, 81, 84]. Recently, Szymanski [77] showed that the general problem is 
NP-complete, and with Yannakakis [78] showed that the problem is NP-complete even when 
every wire connects exactly two terminals. This might explain why the fast heuristic algorithms 
developed thus far cither produce arbitrarily bad solutions in many cases and/or completely 


fail on other instances. 


Part If of the thesis presents a linear-time algorithm which always produces a near-optimal 
solution. This algorithm is based on the key notion of channel flux which is introduced in 


Chapter 7. The algorithm originally appears in a paper by Baker, Bhatt, and Leighton [3]. 
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Figure 1.2: A minimum width routing. 


1.3. Overview 


The next four chapters are devoted to VLSI graph layout, and form Part I of the thesis. 
Chapter 2 outlines Thompson’s model for VLSI layout, reviews previous research, and describes — 
important layout problems in a formal setting. Chapter 3 focuses on layouts for the simplest 
of networks: binary trees. In addition to presenting new layouts with improved bounds on edge 
lengths, the complexity of producing optimal layouts is examined. The new layout strategy 
motivates the paradigm for general graph layout presented in Chapter 4. Finally, Chapter 
5 shows how the new layout paradigm can be used to efficiently solve the important layout 
problems of Chapter 2. 

Part II of the thesis consists of Chapters 6 and 7. Chapter 6 describes the channel routing 
problem, its use in automatic layout systems, and briefly reviews previous research. Chapter 7 
introduces the concept of channel flux and presents a linear-time approximation algorithm for 
Manhattan routing. 

In conclusion, Chapter 8 summarizes the major results of both parts and outlines a number 


of important, unresolved problems. 


CHAPTER 2 


Issues in VLSI Graph Layout 


The first three sections of this chapter introduce the layout model developed by Thompson 
(79, 80] and briefly review previous research in VLSI graph layout. In particular, we discuss the 
layout strategy of Leiserson [49] and Valiant [83] and note that bounds on layout area based on 
separator theorems can be very different from the actual minimum layout area. The remainder 
of this chapter is devoted to formalizing a number of layout questions motivated by engineering 


considerations. 


2.1. The Layout Model 


In order to cast VLSI layout problems within a mathematical framework, Thompson [79, 
80] developed a formal mode! for VLSI graph layout. The model is based on, and is consistent 
with, the VLSI design rules established by Mead and Conway [55]. It is also similar to the 
widely used Manhattan wiring model. In the Thomspon grid model, a layout for a graph is 
characterized as an embedding within a two-dimensional grid. A two-dimensional grid is a 
collection of horizontal and vertical tracks spaced apart at unit intervals. A layout for a graph 
G is specified by an embedding which assigns nodes of G to points in the grid where horizontal 
and vertical tracks intersect, together with an (incidence- preserving) assignment of the edges 
of G to paths in the grid. The paths of the layout are restricted to follow along grid tracks 
and are not allowed to overlap for any distance (although a vertical path segment may cross 
a horizontal path segment). In addition, the paths may not cross nodes to which they are not 


adjacent. For obvious reasons, we restricl our attention to graphs in which no node has degree 


1) 


14 ISSUES IN VLSI GRAPH LAYOUT 


Figure 2.1: A layout for K4. 


greater than four. As an example, Figure 2.1 shows a layout for the complete graph on four 
nodes. 

Remark. The results of this thesis extend to variants and generalizations of the Thomspon 
grid mode]. For example, graphs with bounded valence greater than four may be laid out by 
mapping each node to a region of the grid, instead of a single grid point. The results are also 
applicable to networks with large processors. Techniques for dealing with large processors are 


described more fully in Chapter 5. 


2.2. Elementary Bounds on Layout Area 


Although there are a variety of important engineering considerations in choosing one layout 
for a graph over other possible layouts, the best. understood, and perhaps the most desirable 
cost measure to minimize is layout area. The area of a layout is most naturally defined as 
the area of the “bounding-box” around the layout, and equals the product of the number of 
vertical tracks and the number of horizontal tracks that contain a node or wire segment of the 
graph. For example, the layout of Figure 2.1 has area 15. This is not the minimum possible; 
there is another layout with area 9. 

How much area does an N-node graph require? Clearly, the area cannot be less than 
.the number N of nodes. On the other hand, by embedding nodes at equally spaced intervals 
along a line, and using a distinct horizontal track for each edge (as shown in Figure 2.2), it is 
clear that the area required for an N-node graph is no greater than O(N?). These bounds are 
independent of the structure of the graph and hold for all N-node graphs. In general, however, 


the minimum area needed to lay out a graph depends on the graph. 
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O(N) 


+ —_———_ 0 (N) ———_——_» 


Figure 2.2: Every N-node graph can be laid out in O(N?) 
area. 


Thompson |79, 80] identified bisection width as an important property of graphs that affects 
minimum layout area. The bisection width of an N-node graph is the minimum number of 
edges which must be removed from the graph in order to disconnect it into two subgraphs 
each of size at least |N/2]. Thompson showed that, up to a constant factor, the layout area 
can be no less than the square of the bisection width. Therefore, if the bisection width for 
a graph is known, a lower bound on area can be easily computed. By showing that certain 
computationally powerful graphs such as the shuffle-exchange graph have large bisection width, 
Thompson showed that these graphs require large area. In fact, Thompson extended this 
observation to obtain area-time tradeoffs for computing certain functions. 

Leighton [40, 41] identified crossing number as another general property that affects layout 
area. The crossing number of a graph is defined as the minimum number of edge crossings in 
any drawing of the graph in the plane. It is easy to see that the crossing number of a graph isa 
lower bound on layout area. Using more sophisticated arguments for special graphs, Leighton 
also directly obtained lower bounds on total wire length (the sum of the lengths of the wires 
in a layout), which of course is a lower bound on layout area. These techniques are heavily 


dependent on the recursive structure of the special graphs and are generalized in [7]. 


2.3. Layouts Based on Separator Theorems 


Leiserson [49, 50] and Valiant [83] investigated general properties that provide effective 
upper bounds on layout area. They independently developed a divide-and-conquer strategy for 


graph layout and showed, for example, that every N-node tree can be laid out in O(N) area 


16 ISSUES IN VLSI GRAPH LAYOUT 


and that every N-node planar graph can be laid out in O(N |g? N) area. Their technique is 


based on the notion of separator theorems for graphs. 


Definition: A class of graphs which is closed under the subgraph relation is said to have 

an f(x)-separator theorem if there exist constants a and b where 0 < a < 1/2 and b> 0 

such that every N-node graph in the class can be partitioned (by the removal of at most 

bf(N) edges of the graph) into disjoint subgraphs having a’N and (1—a’)N nodes where 

a<a’<1-—a. 

Given a class of graphs for which a scparator theorem is known (e.g., trees have a 1- 
separator theorem [52] and planar graphs have a \/z-separator theorem [53}), it is possible to 
construct a layout for any N-node graph in the class by using a simple divide-and-conquer 


approach. For example, Leiserson [49, 50] proved the following upper bounds on layout area. 


z*-separator theorem Layout Area 
(om < 1/2 O(N) 
a= 1/2 O(N |g? N) 
a> 1/2 O(N?*) 


Remark. The layout procedure assumes that a complete recursive decomposition of the graph 
is given. If a complete decomposition is not given, then there is no known polynomial time 
algorithm which achieves the upper bounds on area. This severely limits the applicability of 
scparator-based layout strategies to classes of graphs (such as trees or planar graphs) for which 


decompositions are casily computed. 


Ilow good are the preceding area bounds? Thompson [79, 80] and Leighton [40, 41] showed 
that none of the bounds can be improved. More precisely, they showed that within cach class 
there is a graph for which the bound is optimal. But this does not mean that the bounds are 
optimal for every graph within a class. In fact, while the bounds are existentially optimal, 
they are not universally optimal. For example, an N-node square grid can be laid out in arca 


linear in N, but since the minimum separator theorem for the class of square grids is fz, the 
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best bound obtainable by separator-based layouts is O(N |g? N), which is off by a factor of 
O(lg? N) from the optimal. Of course, since N-node graphs require area at least N, the bounds 
for graphs with £*-separator theorems, a < 1/2, are asymptotically universally optimal. 

For graphs with larger separator theorems, the discrepancy between the minimum layout 
area and that given in the table can be much worse. Consider, for example, the N-node graph 
Sn which consists of N/lg N disjoint lg N-node expander graphs. An m-node expander graph 
has the property that every subset of & nodes is linked by O(min(k, m— k)) edges to the m—k 
nodes outside the subset.” The bisection width of such a graph is Q(m), and hence the minimum 
separator theorem is Q(z). The existence of trivalent graphs that satisfy this defintion has been 
known for a long time [28, 31]. In fact, almost all trivalent graphs satisfy this definition. Since 
each lg N-node expander graph can be trivially laid out in Ollg? N) area, the layout area of 
Syn is no greater than O(N lg N). However, Leighton [42] showed that the minimum separator 
theorem for the class of graphs Sy exceeds 0(z/ Ig? x), so that the area bound from the table 


above is O(N?/\g* N), which is much worse than the optimal bound of O(N 1g N). 


Remark. Any class of graphs closed under the subgraph relation and containing Sy must 
also contain expander graphs. Hence, the minimum separator theorem (as defined earlier) for 
the class is O(z). Instead of defining separator theorems for classes of graphs closed under the 
subgraph relation, it is more convenient (and general) to define separators for individual graphs 
in terms of the subgraphs produced by its recursive decomposition. Using the less restrictive 
(but more uscful) definition, it is possible to show that Sy has an O(N/lg N)-separator. The 
lg N-node expander graphs are split in the upper levels of the decomposition and never appear 
intact as subgraphs in the lower levels of the decomposition. Leighton [42] proved that even 
using the most liberal definition, the minimum separator for Sy is at least Q(N/ 1g” N). Any 
bound on layout area for Sy based on the minimum separator can therefore be no less than 


Q(N?/Ig* N). 


Thus, while the divide-and-conquer strategy based on separator theorems gives existentially 


*The original definition of expander graphs is slightly different from that given here. We adopt this minor 
variant because it allows nodes of degree no greater than three. 
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optimal bounds, the bounds can be unacceptably poor in a universal sense. It was the discovery 
of such large discrepancies that led to the search for an alternative framework for VLSI layout. 
Within the new framework presented in Chapter 4 we shall sce how these large discrepancies 


are overcome. 


2.4. Eight VLSI Graph Layout Problems 


As mentioned earlier, there are many important considerations in choosing one layout over 
a multitude of other possible layouts. The problems in this section are motivated by some 
engineering concerns fundamental to circuit design and layout. Though not exhaustive, this 
list covers most of the theoretical issues studied recently. Many of the problems are known 
to be NP-Complete. The emphasis throughout this thesis is the development of a general 
unifying framework for dealing with diverse issues in a uniform manner. Within the framework, 
solutions to some problems are reasonably close to optimal. For other problems, good heuristics 


are developed or suggested, and general bounds obtained. 


Problem 1. Given a graph G, produce an area-efficient layout for G. 


As mentioned before, minimizing area is a critical concern in VLSI circuit layout. In 
addition to the work on arca-efficient layouts described in the previous section, Dolev, Leighton, 
and Trickey [22] have shown that determining the minimum layout area of a forest of trees is 


NP-Complete. 


Problem 2. Given a graph G, produce an area-efficient layout for G with minimax edge 


length. 


Besides area, speed is another critical factor in chip performance. Signals do not propagate 
instantancously across wires, and the longer the wire, the longer the propagation delay. In 
pipelined or systolic systems, the effect of propagation delays is even more dramatic. The 
maximum delay determines the clockperiod, and hence the throughput, of the system. To 
maximize throughput we need to minimize the maximum delay. In short, we must produce 


layouts so that the longest. edge is as short as possible. The minimum, over all layouts, of the 
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length of the longest edge is called the minimaz edge length. 

Paterson, Ruzzo and Snyder [59] studied the problem of minimizing edge lengths for 
complete binary trees. They showed that the minimax edge length of an N-node complete 
binary tree is O(VN/lg N). Adopting a different strategy based on separator theorems, the 
next chapter presents a general technique for bounding the maximum edge length of arbitrary 
trees, while Chapters 4 and 5 extend the techniques to general graphs. The next chapter also 


shows that minimizing the edge lengths of trees is NP-complete. 


Problem 3. Given a graph, produce an area-efficient layout in which each wire has 


bounded delay in the capacitive model. 


Although it is certainly true that propagation delay across a wire depends on the length 
of the wire, there has been little consensus on how fast propagation dclay grows as a function 
of wire length. Thompson [79, 80] assumes propagation delay to be constant, indenendent of 
wire length. This might seem unreasonable given the ultimate speed-of-light limitation which 
indicates that the delay increases linearly with length. The speed-of-light limitation, however, 
greatly exaggerates the importance of wire delay in determining the speed of circuils. Mead 
and Conway [55] take into account some of the electrical characteristics of interconnections on 
MOS integrated circuits, and emphasize the role of wire capacitance in determining propagation 
delay. Recent analysis by Bilardi, Pracchi, and Preparata [10] strongly supports the belief that 
capacitive effects play the predominant role in determining the speed of MOS circuits. 

In a capacitive model, each wire is assumed to present a purcly capacitive load to the 
transistor that drives a signal across the wire. This load is proportional to the length of the 
wire plus the area of the transistor that receives the signal. The delay is proportional to 
this load divided by the area of the driving transistor. By increasing the size of the driving 
transistor it is therefore possible to bound the propagation delay, independent of the length of 
the wire. A second well-known technique for reducing delay across a long wire is to “ramp” 
the wire with a geometrically increasing series of inverters [55]. The number of intermediate 
drivers, and hence the delay, is logarithmic in the length of the wire, bul an attractive feature 


is that this process can be carried out without the need to resize the original transistors in the 
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circuit. 

Of course, increasing the size of one transistor or introducing new transistors might force 
some wires to be stretched to avoid the enlarged transistor areca. In other words, decreasing 
the delay across one wire might force an increase in delay over other wires. Leiscrson [47] and 
Mchlhorn [56] independently posed the question of whether or not the transistors in a layout 
could be resized so that every wire in the layout has constant propagation delay. Ramachandran 
[65] investigated the problem of introducing intermediate drivers along long wires to decrease 
delays, but under the constraint that the topology of the layout remain unchanged. With the 
restriction that wires can not be rerouted, she showed that logarithmic delay can be achieved, 
but at the expense of squaring the layout area in the worst case. We allow the layout topology 


to be changed, and obtain significantly better results. 
Problem 4. Given a graph G, produce a layout for G with few wire crossings. 


An undesirabie feature of iayouts is the presence of a large number of wire crossings. 
When two wires cross, they must be on different layers. For faster operation, and less power 
dissipation, it is advantageous to maximize the total amount of wiring on a layer of low 
resistance, e.g. the metal layer, while minimizing the wiring on a layer of high resistance, 
e.g. the polysilicon layer. The net wiring on one layer may be reduced by laying wires on that 
layer only just before and after two wires cross. If the number of wire crossings is small, the 
number of contact-cuts which connect wire segments on different layers is small so that the area 
of the layout is not blown up by the contact cuts which occupy large area. In addition, long 
wires that are crossed by many other wires are susceptible to cross-talk when all the crossing 
wires simultaneously carry the same signal. 

The crossing number of a graph is defined to be the minimum number of wire crossings in 
any drawing of the graph on the plane. Leighton [40, 41] proved upper and lower bounds on 
crossing numbers and then used the results to find bounds on layout area. Garey and Johnson 


[29] showed that determining the crossing number of bipartite graphs is NP-Complete. 
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Problem 5. Given a graph, produce an area-efficient regular layout for the graph. 


Some design methodologies, most notably gate-arrays, require that processors be located 
at fixed positions on a chip. In gate-arrays the processors are placed in a grid pattern with 
uniform spacing between processors adjacent along every row and column. Such layouts are 
said to be regular. An important advantage of this design restriction is its flexibility: even if 
the size of every processor is increased, the wiring between processors remains unaffected and 
the total area remains proportional to the sum of the wire area (as computed with unit-size 
processors) and the processor area. This is because only the VN rows and columns containing 
the N unit-size processors need to be expanded to accomodate the non-unit-size processors. In 
non-regular layouts, every row and column might have to be expanded since there might be a 
node in every row and in every column. Increasing the linear dimension of the processors by a 
factor of s could result in an O(s”) increase in layout area. 

Previous divide-and-conauer layout strategies do not. produce regular layouts. Hence, they 
are not useful in laying out circuits with non-unit-size processors. A good strategy for producing 


regular layouts would solve the nagging problem of how to cope with variable-size processors. 


Problem 6. Design area-efficient chips that can be configured to realize a large number 


of graphs. 


Because it is expensive to make one chip but cheap to make many copies, manufacturers of 
custom chips have been encouraged to make configurable designs such as gate-arrays, ROM’s 
and PLA’s. In such designs, the entire chip is prefabricated except for one layer. The customer 
then specifies a configuration for the chip, and the final layer of metalization connects up 
the circuitry in that particular way. Hence, most of the design and fabrication costs can be 
factored over many custom chips. Similarly, the fast emerging laser-restructuring technology 
[64] provides another economical way to customize chips after fabrication is complete. Laser 
restructuring allows connections between wires to be made or broken after the chip has been 
fabricated. In cither case, it is desirable to design layouts that can be configured from one of 


a few basic patterns. 
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Problem 7. On a wafer which has arbitrarily distributed defective cells, realize a given 


graph on the good cells. 


In any fabrication process, it is expected that some of the processing cells will be defective. 
In a two-dimensional array of cells on a wafer in which defective cells are arbitrarily distributed, 
it may still be possible to use the wafer by configuring wires around the defective cells. This 
may, for example, be performed by laser restructuring techniques [64]. Given this ability to 
isolate defective cells, it is important to consider how a graph may be realized on the remaining 
good cells. This problem has received considerable attention recently (33, 45, 69]. The problem 
is similar to the general graph layout problem in the Thompson model but with the important 


restriction that nodes of the circuit can only be mapped to a restricted set of nodes in the grid. 


Problem 8. Given a graph G, assemble G using the minimum number of copies of a 


single chip having few external pin connections. 


A number of very large networks have been proposed in recent years for implementing 
priority queues [48], for searching [5], for direct execution of applicative programming languages 
[54], and for recognizing regular expresions [26]. Some of these networks are too large to fit 
on a single chip. For example, the trec-structured network of [54] is envisioned to contain 
as many as one million processing elements. Clearly, such networks must be partitioned over 
many interconnected chips, so that each chip realizes a small portion of the network. 

The technology for packaging chips severely limits the number of external pin connections 
on achip. While chips with over a million components are forseeable in the near. future, no one 
predicts a chip with over two hundred external pin connections. This poses a pressing problem 
in assembling large networks of processors. 

fiven if a network could be partitioned so that each portion has only a few external 
connections, It would be cconomically infeasible to design cach chip individually. For instance, 
it would be prohibitively expensive to design one thousand different chips, each containing a 
thousand processing elements, to assemble a network of one million processors. For this reason, 
it is necessary to assemble large systems using copies of a few configurable or restructurable 


chips. The next chapter presents one solution to the problem of assembling large tree structures 
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using copies of a single, area-efficient, restructurable chip with few external pin connections. 


Within the new framework, efficient solutions are provided for cach of these problems. In 
fact, a single layout simultancously solves many of these problems efficiently. The framework 
provides a two-step strategy for solving these problems. First, the graph to be laid out is 
embedded within a very special network called the free of meshes. For the tree of meshes it is 
possible to solve all these problems efficiently. In the second step, therefore, a good layout for 


the tree of meshes also solves these problems for the embedded graph. 


CHAPTER 3 


layouts for Trees 


A binary tree may not be the best multiprocessor organization, but it has been proposed by 
many researchers for a variety of reasons. For example, a complete binary tree can be the major 
component of a priority queue resource [48] and of a smart-memory raster graphics system [27]. 
A complete binary tree can also serve as a hardware structure for searching [5], for databases 
[75], or for direct execution of applicative programming languages [54]. Browning [15] proposes 
a complete binary tree for general-purpose multiprocessing, and two systems based on her ideas 


are being built at Caltech and Bell Laboratories. 


Attention is also directed to binary trees which are not complete. Floyd and Ullman [15] 
show that strings described by a regular expression can be recognized by processing clements 
organized as the parse tree of the regular expression. Foster and Kung [25] have a similar 
scheme based on the simple configurable layout developed by Leiserson [50]. There are other 
proposals, for example [58, 74], of machine organizations that, while not trees, are nevertheless 
tree-like. 

We shall not debate the merits of the various tree machines here, but shall confine ourselves 
to understanding their physical organization. In this regard trees are particularly attractive 
because of their simple interconnection structure. Not only can trees be laid out efficiently, but 
good layouts for trees also suggest cflicient ways to lay out general graphs. Morcover, problems 
that are intractable for trees are also intractable in general. Thus, by investigating layouts for 


trees we stand to learn more about general graph layout. 


In the following scetion we examine two well-known layouts for complete binary trees and 


present a better layout which minimizes (asymptotically) both area as well as maximum edge 
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| 
Figure 3.1: An O(nign) area layout of a@ complete binary 
tree. 


length. These bounds are extended to arbitrarily structured trees in Section 3.2, and to planar 
layouts for trees in Section 3.3. Computing the minimum edge length exactly is shown to be 
NP-complete in Section 3.4. Section 3.5 describes Leiserson’s [50] assembly of large complete 
trees using multiple copies of a single chip with only four external pin connections. Section 
3.6 introduces and examines the two-color bisection problem for arbitrary trees. Section 3.7 
presents one way to assemble large arbitrarily structured trees using the minimum number of 


copies of a single restructurable chip with few pins. 


3.1. Layouts for Complete Binary Trees 


In addition to their usefulness in speeding up computation time by allowing both paral- 
lelism and pipelining, complete binary trees are attractive also because they can be laid out 
efficiently. Figure 3.1 shows the naive layout of a complete binary tree. Since the height of an 
N-leaf tree is lg N, and the N leaves are spread out over a line of length 2N, it follows that 
the area of the layout is 2N 1g N. Furthermore, the longest edges are at the top level and their 
length is 4.N. 

The familiar H-tree layout in Figure 3.2 was originally proposed by Mead and Rem [55]. 
In contrast to the naive layout which, in a sense is one-dimensional, this layout exploits both 
dimensions symmetrically. If S(N) is the side of the layout, then we have that S{1) = 1 and 


. More generally, 


S(N) = 2S(N/4) +1, 


which yields S(N) = 2V.N — 1. Consequently, the area of the layout is no greater than 4N, 


The longest edges are again at the top level, and their length is no more than LJN. 
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Figure 3.2: The H-tree layout of a complete binary tree. 


The H-tree layout asymptotically minimizes area but not maximum edge length. Paterson, 
Ruzzo, and Snyder [59] demonstrated a lincar-area layout with maximum edge length 
O(VN|/ lg N). In any layout there are two nodes which are distance VN apart: moreover, these 
two nodes are connected by a path containing no more than 2lg N tree edges. It follows then 
that at least one of these edges must have length at least “N/2lg.N. Thus. the layout of [59] 
asymptotically minimizes area as well as maximum edge length. Unfortunately, however, the 
layout technique of [59] does not extend to more general graphs. The remainder of this section 
demonstrates another layout with asymptotically optimal area and maximum edge length. The 
following section generalizes our technique to arbitrary trees and, the next chapter to general 
graphs. 

To illustrate our technique, consider the layout of Figure 3.3 in which the nodes at the 
second and third levels of the tree have been brought closer to the root so that all edges within 
the top four levels are of equal length. This “averaging” of edge lengths reduces the maximum 
accomodate two edges instead of one. This increases the areca of the layout, but only slightly, 
from 4N to 4N + 6VN. 

This averaging operation can be carried out further down the tree so that many levels 
are brought closer towards the root. In order to space top levels of the tree closely together, 


we embed these ievels within an H-channcl structure shown in Figure 3.4. This structure is 
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Figure 3.3: The H-tree layout with shorter edges at the top 
levels. 


Figure 3.4: The H-channel structure. 


obtained by taking the H-tree layout of a complete binary tree and blowing up the layout in 


both dimensions by a suitable factor. The details of the embedding are described next. 


Theorem 3.1. An N-node complete binary tree can be embedded in linear area with mazimum 


edge length O(VN/ Ig N). 


Proof. To layout a complete binary tree with N leaves, start with the H-tree layout of 
a complete binary tree with lg? N leaves which has area 41g* N and maximum edge length 
} 1g N. Blow up this layout in either dimension by a factor av N | lg N, where a@ is a constant 


specified later. The arca of the layout becomes 4a?N and the longest. channel has length 


JaV/N. 
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Next, lay the root at the centre of the EH-channel structure and place the second level nodes 
at distance BVN/ lg N from the root on cither side. Once again, @ is a constant specified later. 
Place lower levels of the tree as shown in igure 3.4, with successive levels spaced equally apart. 
At every corner of the H-channel structure, biseet the tree so that the subtrees embedded 
within the two substructures are of equal size. Finally, in the lowest level channels lay out the 
remaining subtrees in the I]-tree manner. 

We must ensure that every channel is wide enough to accomodate all the nodes in any 
level embedded within it, and also that the H-tree layouts in the final step fit within the lowest 
level channels. To satisfy these conditions, let us first calculate the total number of tree levels 
embedded in all but the lowest level channels. The total length of all channels encountered 
from the centre of the layout to the end of a terminal channel does not excced the quantity 
2aVN. Since the distance between successive tree levels is B/N / lg N, the number of tree 
levels embedded is bounded by (2a/8)lg N. The total number of trce nodes within any one 
of these levels is therefore no greater than N2°/F, If 2a/8 < 1/2 then the number of nodes 
in any level is asymptotically less than the width of a channel which equals BYN/ lg N. The 
first condition is therefore satisfied by having a < 8/4. 

To ensure that the H-tree layouts at the final step fit within the final channel, it suffices 
to check that the dimensions of the layout are smaller than the dimensions of the channel. 
The size of a subtree embedded within a final-level channel cannot be more than N/ lg? N 
because the tree is split into half at cach corner. The side of the Il-tree layout is no greater 
than 2/N/lg N. By choosing a > 2, the side of the channel is guaranteed to be larger than 
a side of the [H-tree layout. Therefore, by choosing a > 2 and 6 > 4a, we see that the layout 
can be completed. Finally, the area is lincar in N and the maximum edge length is bounded 


by O(VN/ lg N). | 


3.2. Layouts for Arbitrary Binary Trees 


One property of complete binary trees crucial to the layout of Theorem 3.1 is that a 


complete binary tree can be bisected into two equal size subtrees simply by removing the root. 
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At every corner in the Il-channel structure, a forest of complete trees is bisected into two equal 
halves, each “growing” in opposite directions. This controls the size of every subgraph at the 
final level so that a standard layout fits within a final-level channel. 

Arbitrarily structured binary trees are only slightly harder to bisect. Any N-node binary 
tree can be separated into two components, each with no more than [2N] + 1 nodes, by 
removing a single edge [52]. (The worst-case occurs for the four-node tree in which one node is 
adjacent to three others.) Mither of the two components might be a forest, but the same result 
applies to forests, so that the binary tree can be split recursively. By recursively splitting the 
larger component, a tree can be bisected by cutting at most O(lg N) edges, or by removing 
the nodes incident to these edges. The O(lg N) bound follows because the subgraphs decrease 
geometrically in size with each cut. 

The property that all trees have small bisections was used by Leiserson [49, 50] and Valiant 
[83] to show that all trees have lincar-area layouts. We strengthen this result to show that the 
maximum edge length of any N-node tree is bounded by O(VN/ lg N). The details of the 


layout are described in the following Theorem. 


Theorem 3.3. Every N-node tree can be embedded in linear area with maximum edge 


length O(VN/ lg N). 


Proof. As before, begin with the IL tree layout of a complete binary tree with lg? N leaves, 
and blow up the layout in cither dimension by a factor aW/N/lg.N, where a is a constant 
specified later. The area of the layout becomes 4a?N and the longest channel has length 
baal. 

Find a set of O(lg N) nodes which bisect the tree and locate them at the center of the 
layout. Place nodes of the tree in breadth-first levels starting with the bisector set as the roots 
of the search, so that consecutive levels are distance BVN/ lg N apart (@ is a constant specified 
later). At every corner of the H-channel structure, bisect the remaining forest of subtrees so 
that the subforests embedded within the two substructures are of equal size. Add the new 
bisector set to the set of nodes from the previous breadth-first level, as shown in Figure 3.5. 


In the new channel, start with the updated set as the root of a breadth-first search and repeat 
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Figure 3.5: /nserting new bisector sels al cvery corner, 


the procedure used before. Finally, in the lowest level channels lay out the remaining subtrees 
using the standard divide-and-conquer layout of Leiserson [49, 50] or Valiant [83]. 

As before, we need to ensure that every channel is wide enough to accomodate all the nodes 
embedded within any level, and also that the layouts in the final step fit within the lowest level 
channels. 

Let us first calculate a crude upper bound on the total number of nodes embedded in any 
one breadth-first level. This quantity is certainly less than the total number of nodes embedded 
in all but the final-level channels. To bound the latter quantity, suppose that nodes in each 
bisector set within the H-channel structure are pulled in to the center of the layout, and the 
remaining nodes placed in breadth-first levels until the final-level channels. Bringing all the 
bisector sets towards the center can only increase the number of nodes in all but the final-level 
channels. Since an N-node tree has a bisector of size O(Ig N), the total number of nodes within 


the union of all bisector sets is bounded by: 


2lgigN N 
O Ie — | = Ofig? N). 
& S| (ig? N) 


i=0 
The total length of all channels encountered from the centre of the layout to the end 
of a final-level channel does not. exceed 2aVN. Since the distance between successive tree 
levels is BVN / lg N, the number of tree levels embedded within the H-channel is bounded by 
(2a/S)lg N. Starting with O(lg? N) nodes as the roots of a breadth-first search, the number 
of nodes cricountered in (2a/8)l¢.N levels cannot exceed O(N7°/8 Ig? N). Since every node 


embedded within the H-channel must be in one such breadth-first level, the previous quantity 
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also bounds the total number of nodes within the [-channel structure. By choosing 20/8 < 
1/2, or a < 8/4, we see that the width of a channel asymptotically exceeds the number of 
nodes in any level within the channel. Therefore, the first condition is satisfied by having 
a< B/4. 

To ensure that the layouts at the final step fit within a final-level channel, it suffices to 
check that the dimensions of a layout generated by the Leiserson—Valiant strategy are smaller 
than the dimensions of the channel. Their layout of an x-node tree is linear in z, i.c., bounded 
by yz, for all z and some constant y. In the layout described above, the size of a forest 
embedded within a final-level channel cannot be more than N/lg” N because the tree is split 
into half at each corner. The side of a layout at the final level is no greater than /yN/lg N. 
By choosing a > 4/7, the side of the channel is guaranteed to be larger than a side of the H-tree 
layout. Therefore, by choosing a > ,/y¥ and f > 4a, we see that the layout can be completed. 
Finally, the area is linear in N and the maximum edge length is bounded by O(/N/ lg N). Wl 


3.3. Planar Layouts for Trees 


It is sometimes necessary to produce layouts in which distinct edges do not cross one 
another. Planar layouts have the advantage that only one layer of interconnect is required; by 
using a low-resistance metal layer, the resulting circuit is not only faster, but also dissipates less 
power. Many current automatic layout systems reserve a single layer of interconnect for special 
purposes such as, for example, power and ground connections. In such cases, it is necessary to 
find good planar layouts. Needless to say, the underlying connection scheme must be planar. 

Planar layouts may require much more area than non-planar layouts. In particular, Valiant 
[83] demonstrated an N-node planar graph for which every planar layout occupics at least 
Q(N?) area and has edges of length Q(N). On the other hand, Leiserson [49, 50] and Valiant 
[83] showed that every N-node planar graph can be laid out in O(N ig? N) area with cdges of 
length O(VN le N) in Thompson’s layout model, which allows distinct wires to cross, 

Valiant [83] further showed that every tree has a linear-area planar layout. In other words, 


the planarity restriction does not affect the asymptotic areca requirements of trees. But what 
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about edge length? Intuitively, the length of a wire can be reduced by taking a short-cut across 
another wire, instead of going around it. So, an important question is whether the planarity 
requirement affects the maximum edge length for trees. 

Although the layout of Section 3.2 has lincar areca, and asymptotically optimal edge length 
in the worst-case, it is nol guaranteed to be planar. However, Ruzzo and Snyder [70] showed 
that this layout could be transformed into a planar layout without increasing edge length 
asymptotically. The details of their transformation are fairly complicated; in the following 


Theorem, we present a simpler transformation. 


Theorem 3.4. Every N-node tree has a linear-area planar layout with matimum edge 


length O(VN/ lg N). 


Proof. The layout proceeds exactly as in the proof of Theorem 3.3, with particular 
attention paid ta the way a hisector eet ie ehasen and to the ordering of nodes within the set. 
In particular, if a forest of 2 nodes has to be separated from an N-node tree, « < |N/2], 
then it suffices to remove at most [lg z] nodes. The key fact is that these nodes can be chosen 
from a single path in the tree. This path induces a natural linear ordering on the set of nodes 
removed, 

To sce this, consider a binary tree rooted at a node of degree cither one or two. It is always 
possible to choose such a root, and if the remainder of the tree is drawn in levels then every 
internal node has at most two sons. Label each node in the trec by the size of the subtree 
rooted at that node and below it. Pick any node whose label is no less than z, and both of 
whose sons have labels less than a. Mark this node. If its label cquals 2 then we have found a 
node whose removal separates a subtree of the required size. Otherwise, one of its sons must 
have a label y > [2/2], while the other son has label no less than z — y — 1. Recursively 
mark nodes in the subtree rooted at the second son so that the removal of the marked nodes 
separates a forest of size r—y—1. It is easily seen that the marked nodes lie along a path of the 
original tree. Moreover, the removal of all marked nodes separates a component of size exactly 
z. Finally, since the first node separates a component of size at least [2/2] + 1, it follows that 


no more than [lgz] nodes are marked. Figure 3.6 illustrates this procedure. 
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Figure 3.6: Three cuts separate a subforest of 18 nodes. 


BISEC TOR SET 


Figure 3.7: The removed nodes are placed in the order 
of occurrence along the path. 


Given a tree, use the above procedure to find a set of nodes which bisect the tree, and lie 
along a path. Place these nodes at the center of the layout in the same order in which they are 
encountered along the common path. Next, find all nodes adjacent to the bisector set and place 
them on either side as before. [lowever, Lhe ordering of nodes in these breadth-first levels is 
chosen as follows: for each pair of nodes u,v that are placed next to each other in the bisector 
set, if the path connecting them is u, t,,¢te,...,¢%,v, then place nodes ¢; and t, next to each 
each other in the second level, as shown in ‘igure 3.7. The orderings of nodes on either side of 
the center again satisfy the condition that nodes connected by a path in the forest embedded 
on that side appear in the order in which they are encountered along the common path. 

By placing nodes in every level in the same order in which they lic along a common path 
within the forest still to be embedded, it is easy to guarantee that the layout is planar inside 
the channel (sce ligure 3.7). All that remains is to guarantce that the layout can be made 
planar at every corner when new bisector sets are added to a level. 

When the end of a channel is reached, the situation is as shown in Figure 3.8. Nodes 
U1, U2,..-,Un, are those in the last level of the channel. The subgraph which remains to be 


embedded is a forest of subtrees. ‘The n nodes can be grouped according to which subtree they 
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Figure 3.8: To bisect a forest of trees, only one tree need 
be separated. 


|e 
5B 
[b 


Figure 3.9: Nodes in the final level may be connected to 
their subtrees without crossovers. 


belong to, nodes in the same subtree being adjacent within the ordering. To bisect this forest, 
it suffices to split only one of these subtrees: order the subtrees top-down and pick the lowest 
one so that the subforest above it contains at most one-half of all nodes in the forest. Split 
the subtree this node belongs to into two components as required so that the original forest is 
bisected. By laying out the next breadth-first level and the new bisector nodes as in Figure 
3.8, we sce that in each of Lhe two lower-level channels the nodes within the same subtree are 
ordered in the order in which they are encountered along a common path, 

Repeating this process further down the H-channel structure, we sce that the layout is free 
of wire crossings. ‘To complete the layout, within the final-level channels we use Valiant’s [83] 
lincar-area planar layouts for cach remaining subtree. Ledges from these subtrees to nodes in 
the last breadth-first level of the penultimate channel can be inserted without crossovers as 


shown in Figure 3.9. This completes the planar layout. i 
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3.4. The Complexity of Minimizing Edge Lengths 


Thus far we have only showed that every tree can be laid out with maximum edge length 
bounded by O(VN/Ig.N). While this bound is asymptotically optimal for some trees such as 
the complete binary tree, it is way off for others. For example, a two-ended string with every 
node connected only to its immediate neighbors can be trivially laid out with every edge of 
length one, independent of the number of nodes. 

This motivates the problem: it Given a tree, produce a layout with minimax edge length. In 
this section we show that determining the minimax edge length is computationally intractable. 
The results are quite discouraging — even the problem of deciding if a given tree can be laid 


out with all edges of unit length is NP-complete. 


Theorem 3.5. Given a tree I’, deciding whether or not T has a layout with unit length 


edges i3 NP-complete. 


Proof. Observe that the problem is clearly in NP; it is easy to guess a layout and verify 
that no edge has length greater than one. It remains to show that the problem is NP-hard. 

The known NP-complete problem used in the reduction is the NOT-ALL-HQUAL 3CNFSAT 
problem [29, 72] stated below. 

NOT-ALL-EQUAL 8CNFSAT: Given a boolean formula ¢ in 3CNF (conjunctive 
normal form with three literals per clause), does there exist a truth assignment which satisfies 
@ such that cach clause contains at least one false literal? 

Given a formula ¢ in 83CNF, we construct a graph G with the property that G can be 
laid out with all edges of unit length if @ is an instance of NOT-ALL-EQUAL 38CNFSAT, 
i.c., @ can be satisfied with at least one false literal per clause. The graph G is constructed 
from clementary components termed “lines” (Iigure 3.10). The crucial property of a line is 
its rigidity, meaning that in any layout with unit-length edges, nodes uy,..., Un, must be lined 
up either horizontally or vertically. Figure 3.10 shows how to connect two lines so that the 
resulting graph can be laid out in only two ways (ignoring rotations). 


Let 21,...,;2n be the variables, and Cy,...,Cm be the clauses of ¢. The basic “skeleton” 
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Figure 3.10:4 “rigid” line with cractly one unit-length layout{a). a 
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out in exactly two ways. 
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Figure 3.11:The skeleton of the transformation. Hach column 
represents a variable, while each clause 1s as- 
sociated with two rows that are mirror images 
with respect to the z-azis. 


of G is shown in Figure 3.11. For each 7, 1 < 7 < m, the distances (number of intermediate’ 
nodes) a; — C};,a; — C}, are all equal. The line u;—v, corresponds to variable z,, and the two 
ways of embedding it with respect to the A-B axis correspond to assigning z, true or false. 

Thus far, there are 2” possible ways of laying out G with unit length edges, each correspond- 
ing to a truth assignment to the variables of ¢. Next, we encode within G the “structure” of 
¢@ as described below. 

Let clause C; be denoted J,;, V1;, Vl,,. If 1, is positive (z,) add a “striker” at node C;,;,. 
Otherwise, if J,, is negative (z,) add a striker at node Ci.,,- Finally, for every k  J1, Ja, 93; 
add strikers both at C,,, and at C, ,. For example, if C; = z; Vz V 23, the strikers are added 
as shown in Figure 3.12. 

Think of a node without a striker as a “hole”. The rows C, and Cy together share three 
holes, and 2n — 3 strikers. Because of the boundary constraints at the sides, no more than 


n— 1 of these strikers may lie on any side of the A — B axis. In other words, for each clause 
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Figure 3.12:/n any unit-length layout each row contains at 
least one hale; this corresponds to an instance 
of NAE-8CNFSAT. 


Figure 3.13:A binary tree which has a unique (upto rota- 
tions) unit-length layout. 


there must be at least one hole on either side of the axis in a unit-length layout. For each 
clause, a hole “above” the axis implies a truth assignment which makes the clause true, while 
a hole “below” the axis implies at least one false literal within the clause. Therefore, there 
is a unit-length layout if and only if the formula is satisfiable with at least one false literal 
per clause. In short, G has a unit-length layout iff @ is an instance of NOT-ALL-EQUAL 
3CNFSAT. Since the reduction is easily carried out in polynomial time, the theorem follows. 
g 

In the above reduction, many nodes had degree four. We may strengthen the result to 
binary trees with maximum degree 3. ‘A rigid line may be implemented by stringing together 
binary trees as shown in Figure 3.13. It is not hard to show that the structure is rigid; the key 
property is that the complete binary tree on 31 nodes has a unique (upto rotations) unit-length 


layout. This yields the following result. 
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Corollary 3.6. Given a binary tree, deciding whether or not it has a layout with unit- 


length edges is NP-complete. 


3.5. Assembling Complete Trees 


Whenever any system is larger than a single chip, it is necessary to partition it among 
separate chips which can be assembled at the printed circuit (or chip carrier) level. What is 
the most effective way to partition a large binary tree among several chips? 

This question is pressing because although integrated circuit technology has been advancing 
at a rapid pace, the technology for packaging chips has been crawling in comparison. Packaging 
technology severely restricts the number of external connections to an integrated circuit. While 
the number of components per chip is expected to reach one hundred million, no one forsees 
chips with more than two or three external pin connections. 

This section presents Leiserson’s scheme [50] for assembling complete binary trees using one 
kind of chip with only four external pin connections. This chip has been used in tree-machine 
projects at Caltech and Bell Laboratories [16]. We review this scheme here for its simplicity 
and because the general scheme developed in Section 3.7 is based on similar ideas. 

Figure 3.14 shows how arbitrarily large complete binary trees can be built out of a single 
chip that has only four off-chip connections. Hach chip contains one internal node of the tree, 
and the remainder of the chip is packed as full as possible with an H-tree layout. The internal 
node requires three off-chip connections (denoted F, R, and L in the figure) for its father, right 
son, and left son. The H-tree requires only one off-chip connection (denoted T) to its father. 

To interconnect two chips, the unconnected internal node of one of the two chips is selected 
as the father of the two H-trees. In l’igure 3.14 the internal node on the left has been chosen for 
this purpose. The R pin on this chip is connected to its own T pin, and the L pin is connected 
to the T pin on the other chip. Considered as a unit, the combined two chips now have the 
same structure as a single chip —- three connections to an internal node and one to the root 
of a complete binary tree. The pair of chips can be similarly combined with another pair to 


produce a quadruple of chips, which can in turn be combined, and so forth. Figure 3.15 shows 
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Figure 3.15:A large complete binary tree assembled using 
many copies of the same chip. 


a large complete binary tree which has been wired up in this recursive fashion. 


Unlike the assembly for complete trees, configurable or restructurable designs are required 
for assembling arbitrary binary trees. The reason is simple: a single fixed chip with N processors 
can realize only one N-node binary tree. In order to realize every N-node binary tree, either a 
new mask must be designed for each tree, or else connections on the chip must be restructured 
(for example, by laser) after fabrication. Given the ability to restructure wires on a chip, we 
ask: Is there an area-efficient restructurable chip with N processors and m pins (m << N) 


which can be used to assemble every binary tree, independent of its size? 
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This question is affirmatively answered in Section 3.7. The solution depends heavily on 
the results of the next section which considers the problem of partitioning a binary tree into 
subforests of size N so that every subforest has at most O(lg NV) edges connected to nodes in 
other subforests. ‘The solution to this problem leads directly to the restructurable chip design 


of Section 3.7. 


3.6. Collinear Layouts and Two-color Bisectors 


This section introduces the notion of two-color bisectors for trees. Two-color bisectors 
are a natural extension of graph bisectors, and will be critically used in partitioning graphs 
for layout. In this section we show how to use two-color bisectors to partition an arbitrary 
tree into subforests of size N so that every subforest has at most O(lg N} edges connected to 
nodes in other subforests. Bounds on the size of two-color bisectors are obtained from collinear 


layouts developed by Bentley and Leiserson [50]. 


Definition. Suppose that an N-node graph G has b black nodes and w white nodes. A two- 
color bisector for G is a set of edges whose removal bisects G into two subgraphs each of size 


at least |N/2{, and such that each contains at least |b/2| black and |w/2| white nodes. 


Theorem 3.7. Every N-node forest of binary trees has a two-color bisector of size no greater 


than 21g N. 


Proof. Following Bentley and Leiserson [50], construct a collinear layout for the forest 
as follows. By removing one edge, separate the forest into two subforests so that neither 
contains no more than |?N]|-+ 1 nodes [52]. If either component contains more than |N/2| 
nodes, separate it into two smaller components using the one-separator theorem again. Next, 
recursively construct collinear layouts for each subforest, and place these layouts side-by-side 


along the baseline. Finally, as shown in Figure 3.16, connect the two (or three) subforests by 
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Figure 3.16: The recursive construction of a collinear layout. 


routing the separator edges on distinct vertical tracks and along a common horizontal track. 
(For two components this is trivial since only edge is routed; for three components, place the 
subforest connected to both other subforests in the middle as shown.) For each node there are 
three vertical tracks to accomodate edges incident to that node. 

The height of the layout is determined by a simple recurrence relation. Let h(N) be the 


height of the layout, so that A({1) = 0, and in general, 
A(N) < A(LN/2]) +1. 


A straightforward calculation yiclds h(N) < Ig N. 

Thus far we have ignored the coloring on the nodes. Suppose there are b black nodes and 
N — 6 white nodes. Consider a “window” which overlaps |N/2] consecutive nodes, and place 
it over the leftmost |N/2] nodes. If more than [b/2] black nodes fall within the window, slide 
the window one position to the right. Observe that by sliding the window on position, the 
number of black nodes within the window changes by at most one. Furthermore, by sliding 
the window all the way to the right, less than |b/2] black nodes would fall within the window. 
Consequently, there must be an intermediate placement of the window (sce Figure 3.17) in 
which exactly [b/2] black nodes and exactly [(N — 6)/2] white nodes are contained within the 
window. (Such a placement can be obtained in linear time.) . 

Draw vertical lines through the endpoints of the window in the position obtained above. 
The edges of the forest intersecting these lines form a two-color bisector of the forest. The size 
of this two-color bisector is no more than twice the height of the layout; in other words, the 


size of the two-color bisector is no more than 2lgN. | 
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Figure 3.17:Al some point, a window of size n/2 slid along 
the baseline must contain half the black and half 
the white nodes. 


For our purpose the following variant of two-color bisectors is appropriate. Suppose each 
node of an N-node forest is assigned a weight from a bounded set {1,2,...,k} of weights. We 
wish to bisect the forest into two equal-size subforests whose total weights differ by at most k. 
How many edges need be cut? Adapting the argument for two-color bisectors to this variant 
in a straightforward manner showS again that 2lg N cuts suffice. 

Having obtained bounds on the size of two-color bisectors for forests, we wish to use them 
for partitioning an arbitrary binary tree into subforests of size at most N so that every subforest 
has O(lg N) edges connected to nodes in other subforests. This result is established in the 


following Theorem. 


Theorem 3.8. Every N-node binary tree can be partitioned into[ N/M] subforests, each of 
size at most M, such that no subforest has more than 4lgM + 8 edges connected to nodes 


in other subforests. 


Proof. First bisect the tree into two subforests, each of size at least |N/2], by cutting 
no more than Ig N edges. Split each subforest recursively as follows: For each node in a 
recursively split component of size m assign a weight equal to the number of edges incident 
to that node and which were cut at a previous level. Since the degree of a node is at most 
three, the weight assigned to a node is at most 2. From the argument. following Theorem 3.7, 
there is a weighted bisector of size no greater than 2lgm for the component. This weighted 
bisector divides the number of external connections almost equally (the difference is at most 


two) between the subcomponents of sizes |m/2] and [m/2]. As scen in Figure 3.18, the number 


ASSEMBLING ARBITRARY TREES 43 


Figure 3.18:70 keep the number of external connections to 
all subcomponents small when a component is 
bisected, the external connections must be evenly 
divided betwecn the subcomponents. 


of external connections into either of the new subcomponents is no more than the size of the 
weighted bisector plus one-half the number of external connections into the component just 
split (plus two). This recursive decomposition terminates when each component has size at 
most M. Letting €(m} be the number of external connections into any component of size m, 
we have €(N) = 0, and 
E(m) < 4E(2m) + 21g(2m) + 2. 

A little calculation shows that €(m) < 4]gm-+ 8. This means that every subforest of size 
m in the recursive decomposition has at most 4lgm- 8 external edges to other subforests. 


Substituting M for m, the result follows. | 


3.7. Assembling Arbitrary Trees 


The recursive decomposition of Theorem 3.8 leads directly to the design of an efficient 
restructurable chip which can assemble all trees. Observe that the layouts developed in earlier 
sections cannot be used for configurable or restructurable design because the locations at which 
nodes are embedded are determined by the structure of the tree and are not the same for all 
trees. The only way to have nodes at fixed locations, independent of the tree structure, is by 
predetermining the tracks along which edges are routed. 

We can predetermine the tracks along which edges are routed by using restructurable 
permuters. A permuter P, has k terminals on each side of a rectangle and can realize any 
one-to-one connection between the terminals. The switch shown in Figure 3.19 implements a 


permuter. It has dimensions 2k X k, with the terminals along the longer sides. 
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Figure 3.19:A permuter can realize any set of one-to-one 
| connections between the terminals on its two 
| sides. 
i 


Figure 3.20:A restructurable chip which can assemble ar- 
bitrarily large binary trees. 


The construction of the restructurable chip is recursive and follows the recursive decom- 
position of Theorem 3.8. We shall use R,, to denote a level of the recursive layout with m 
nodes, and let Ra, denote the restructurable chip of M nodes itself. Figure 3.20 shows how 
the chip Ry is constructed from four copies of Ryy/4, four copies of P4ig mw, and two copies of 


Paigm+4- Letting S(M) be the length of the side of the layout, we have S(1) = 1 and, 
S(M) < 25(M/4)+ O(lg M), 


which yields S(M) = O(VM), so that the area is linear in M. The number of pins on Ry, is 
4ig M + 8. We now show that every large tree can be assembled using Ry. 
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Theorem 3.9. Suppose each restructurable chip contains M nodes. Then any N-node 


binary tree can be assembled using [N/M] chips, the minimum possible. 


Proof. Following Theorem 3.8, decompose the tree into [N/M] components, cach of size 
at most M and having no more than 4|g M + 8 external cdges to other components. Hach of 
the [N/M] components can be realized on a single chip Ryg. To sce this, use Theorem 3.8 to 
recursively decompose cach component into single nodes. In this decomposition each subforest 
of size m has at most 4|gm + 8 external edges. This decomposition may now be mapped 
directly onto the chip, using the permuters to route edges between different subcomponents. 
Since the number of external edges at any level is no greater than the size of the permuters at 
that level, the permuters can realize the desired routing. Nodes of the tree are embedded at 
fixed positions in the lowest level permuters ?,. Finally, each chip has enough pin connections 
so that the assembly can be completed off-chip by connecting the chips together as required 


ee “G6 5 %, : » +) + 
by the original decomposition. (Permuters are net needed off chip because wires can bo routed 


directly.) 

The constant factors on area can be improved if one uses the smaller restructurable 
permuter P,, with dimensions (k + O(Wk)) X (k + O(Wk)) that follows from the channel routing 
algorithm of Part II of this thesis. Whereas the simpler permuter from Figure 3.19 requires 
only two welds to make a connection, the dense layout might require as many as & welds for 
each connection. Although the total number of welds required by cither scheme is O(M), the 
number per wire is O(Ig M) if the simpler switch is used and O(ig? M) if the channel-routing 
permuter is used. 

In related work, Rosenberg [69] has also considered permuters to obtain a degree of 


configurability in layouts. 


CHAPTER 4 


The General Framework 


This chapter presents a new framework for general graph layout. Like previous approaches 
to graph layout, the new framework is based on the divide-and-conquer paradigm. Instead of 
using a separator theorem to recursively partition a graph, the new framework uses graph 
bifurcators. The notion of a graph bifurcator was introduced by Leighton [42] to overcome the 
deficiency of separator theorems. Although the differences between bifureators and separator 
theorems will be elaborated in this chapter, there are two primary advantages of bifurcators over 
separator theorems. First, unlike separator theorems, bifurcators may be efficiently computed 
using either a good graph partitioning heuristic, or from a layout with small area. Second, 
bifurcators can be used, as in the next chapter, to produce layouts that.are efficient in a variety 


of respects, not layout area alone. 


The techniques for general graph layout closely parallel those in Chapter 3 for efficient 
tree layout. Section 4.1 examines multi-colored bisectors for two-ended strings and forests of 
complete binary trees, and gencralizes the results of Section 3.6 to more than two colors. 
Section 4.2 introduces decomposition trees and bifurcators as gencralizations of separator 
theorems. Section 4.3 considers the problem of balancing decomposition trees, just as Section 
3.6 considered the problem of decomposing a tree while balancing the number of external edges 
among split components. Section 4.4 introduces the tree of meshes which is a generalization 
of the restructurable chip of Section 3.7, and investigates techniques for embedding gencral 
graphs within the tree of meshes, given a balanced decomposition tree for the graph. Section 


4.5 concludes by developing good layouts for the tree of meshes. 
Taken together, an embedding of a graph within the tree of meshes, and a good layout for 


A6 
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the tree of meshes induce a good layout for the embedded graph. The strategy for laying out a 
general graph, given a decomposition tree is: balance the decomposition tree, embed the graph 
within the tree of meshes, and lay out the tree of meshes. In Chapter 5 we will sce how this 


strategy can be used to efficiently solve all the layout problems described in Chapter 2. 


4.1. Combinatorial Lemmas 


This section contains three combinatorial lemmas which provide the foundation for the 


framework presented in the next section. 


Lemma 4.1. Consider any two-ended string of n colored pearls of k different colors, and 
let n; be the number of pearls which are color i for 1 <i <k. For any integer r > 2, 
the pearls can be partitioned into two sets by cutting the string in no more than 9r* places 
such that the total number of pearls in each set is |n/2| or [n/2], the number of pearls of 
color 1 in each set is |ny/2| or [ny/2], and such that the number of pearls of colori > 1 


in each set lies between [(4 oe d)ni| and (4 + £)ny]. 


Proof. Let 7 be a number between 1 and k and let T(t) denote the number of cuts necessary 
to divide the sct of all pearls into two sets that satisfy the constraints of the theorem for colors 
1,2,...,7. Other than requiring that the total number of pearls be split in half by the cuts, we 
have made no constraints on the distribution of pearls with colors greater than 7. We wish to 
find a good bound on 7(2) in the worst case, i.e., over all choices of n,k > 1%, and all possible 


colorings. In what follows, we will show that 7'(1) = 2 and that 
T(t) CrT(i—t)+4r+7 


for 7 > 1. As a consequence, we can solve the recurrence to conclude that T(z) < 9r* — 15 for 
r > 2. Thus for 7 = k, at most 9r* cuts are required, as claimed. 
For 7 = 1, the argument used in Theorem 3.7 shows that two cuts suffice. Consider a 


“window” of size [n/2] positioned at the left end of the string. Without loss of generality, 
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assume that the window covers less than |[n;/2| of the pearls colored 1. Move the window to 
the right, one pearl at a time until the window covers |n1/2] pearls of color L. Since the right 
half of the string contains more than one-half of all pearls of color 1, there must, by continuity, 
exist a placement when the window covers exactly one-half of all pearls of color 1. By cutting 
the string at the endpoints of the window, the portion of the string under the window will 
contain half of the total number of pearls and half of the pearls colored 1. Hence T(1) = 2, as 
claimed. 

For a given 7 > 1, break the string into r segments S;, 1 <7 <7, (making r — 1 cuts) so 
that each segment contains at least |n,;/r| pearls of color 7, Next split each S; into two subsets 
Sjo and S;; (making a total of rT(i — 1) cuts) so that each split satisfies the theorem locally 
for colors 1, 2,...,¢—1. 

Without loss of generality, assume that Sj9 contains no fewer pearls of color 7 than Sj1. 
At this stage, we divide the set C of all pearls into two subsets C; and Co as follows. Initially. 
Jet Cy = US;o0. If Cy contains more than (4 + ¢:)n;| pearls of color 7, remove Sig from C; 
and add S;,. Repeat this procedure, successively switching Sgqg with Se, S3q with S31, and so 
on until the first time C; has at most Ls + a-)ns| pearls of color 7. Such a stage must occur 
since the number of pearls of color 7 in Cy will eventually fall below [n,/2] if Cy and Cg are 
completely interchanged. The number of pearls of color 2 in Cy after the final switch cannot 


be less than I; _ d)ni] — 2 since every S; contains no more than [n,/7] pearls of color 1. If 


the number of pearls of color 7 in Cy is [(4 — ¢:)ns] — tor [(4 — £)n.] — 2, then move cither 


one or two pearls of color zt from Cg to Cy, making no more than four cuts. 


We also have to ensure that the total set of pearls and the pearls of the first 7—1 colors are 
divided as required. The pearls with colors between 2 and ¢— 1 are divided correctly because 
they were divided correctly at the recursive step. The counts of pearls of color 1 in Cy and C2 
may differ in size by 7, however. To balance the number of pearls with color 1 in each set, we 
need only remove up to [r/2| pearls colored 1 from the excess set (making at most r cuts) and 
put them in the deficient set. To balance the difference in the overall sizes of the sets (which 
now might be as large as 2r + 4), we need only extract up to r+ 2 pearls from the larger set 


(making no more than 2r + 4 cuts) and pul them in the smaller set. Of course, these pearls 
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must be chosen carcfully so that each set retains’ ive «quired minimum number of pearls of 
each color. Since pearls are extracted only from the larger sct, it is clear that this requirement 
may be easily satisfied. 


The total number of cuts made by the procedure is rT(z — 1) + 4r + 7, as claimed. | 


Using an clegant topological argument, Goldberg and West [32] recently proved that k cuts 
suffice to divide the pearls of each color exactly in half. This dramatically reduces the number 
of cuts, and makes our analysis significantly less cumbersome. All of our layout results may, 
however, be proved with the weaker Lemma 4.1. Both results are implementable in polynomial 


time when the number of colors is fixed, as is the case throughout this thesis. 


Lemma 4.2. Consider any two-ended string of n pearls, n; of which are coioredt, 1 < 
i<k. By cutting the string in k places it is possible to divide the pearls into two sets so 
that each set has a total of |n/2| or [n/2] pearls, and |n;/2| or [n,/2] pearls of color i 
for alli, i<ti<k. 


In the following, we recast Lemma 4.2 in terms of complete binary trees, which will be 
particularly useful since the recursive decomposition of a graph may be viewed as a tree. ‘The 
height of a tree is the length of the longest path from the root to a leaf, while the hecght of a 
forest is the maximum height of a tree in the forest. Finally, the level of a node in the forest 
is defined to be the height of the forest. minus the length of the longest path from the node to 


a leaf. (Note that the top level is level zero.) 


Lemma 4.3. Consider a forest of complete binary trees whose n leaves are colored 
arbitrarily with k colors. Let n; be the number of leaves colored i for 1 <<1< k. By 
removing no more than k nodes (as well as all incident edges) from each internal level of 
the forest, it is possible to produce a new forest of complete binary trees, some subset of 
which contains |n/2| or [n/2] leaves, and |n,/2| or [n;/2] nodes of color i for each 1, 
1<i<k. 
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Figure 4.1: An illustration of the procedure in Lemma 4.8. 


Proof. Draw the trees in the canonical manner and place them side-by-side, in any order, 
so that the leaves of all trees are placed along a line. By applying Lemma 4.2 to the induced 
left-to-right ordering on the leaves of the forest, it is possible to break the ordering in no more 
than k places such that the union of the leaves contained in every other segment contains the 
desired total number of leaves and the desired number of leaves of each color. 

For each break, remove the nodes (and incident edges) which are simultaneously ancestors 
of the leaf immediately to the left of the break and the leaf immediately to the right of the 
break. It is easily seen that at most one node is removed from each internal level of the forest 
for each break. Therefore, no more than k total nodes are removed from each internal level. 
In addition, the removal of the common ancestors of the leaves neighboring a break divides 
the associated tree into two or more complete binary trees, at least one on each side of the 


break. Thus the removal of all such nodes produces a forest of complete binary trees, subsets 


of which correspond precisely to the sets of leaves between pairs of adjacent break points. Thus 


the union of the subsets of trees corresponding to every other segment of leaves contains the 


desired number of leaves of each color. Figure 4.1 illustrates this procedure. | 
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Figure 4.2: An (Fo, 1,...,F,)-decomposition tree 


4.2. Decomposition Trees and Bifurcators 


The recursive decomposition of a graph into smaller and smaller subgraphs may be viewed 
as a decomposition tree. In particular, we say that. a graph G hasan (Fo, F\,..., F,)-decomposition 
tree if G can be decomposed into two subgraphs Gog and G, by removing no more than Fo edges 
from G, and, in turn, both Gg and G, can be decomposed into smaller subgraphs by removing 
no more than F edges from each, and so on until each subgraph is either empty or an isolated 
node. Figure 4.2 illustrates this recursive decomposition. 

As one might expect, the decomposition of a graph by separator theorems may be viewed 
as a decomposition tree. It follows by definition that if a class of graphs has an f(z)-separator 
theorem, then there are constants a and f such that each graph in the class has a decomposition 
tree of the form (8f(N), Bf(aN), Bf(a?N),...,Bf(1)). The converse is not necessarily true. 
Subgraphs generated at each step of a decomposition by a separator theorem are constrained 
to be proportional in size, whereas decomposition trees need not satisfy this constraint. Of 
course, if the decomposition tree has precisely lg N levels, then subgraphs at each level must 
be equal in size. 

We shall be particularly interested: in a special class of decomposition trees, namely bifur- 
cators, that is distinct from the class of separators. 

Definition. An N-node graph has an a-bifurcator of size F (more simply, an (F,a)- 

bifurcator) if it has an (F', F'/a, F/a?,...,1)-decomposition tree. 
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Of particular interest is the class of /2-bifurcators. By the definition, we know that an 
N-node graph has a V2-bifurcator of size f if and only if it has an (F, F'/V2, F/2,...,1)- 
decomposition tree. The depth of this tree is no greater than 2lgF. In order to completely 
decompose an N-node graph into individual nodes, the height of any decomposition tree cannot 
be less than the lg N. Thus, / must always be at least WN. On the other hand, F is always 


less than 2/N since every N-node graph with maximum node degree four has at most 2N edges. 


If a class of graphs has an £°-separator theorem, where a < 1/2, and the corresponding 
decomposition is balanced in that every graph is always decomposed into equal-size subgraphs, 
then it is straightforward to show that every N-node graph in the class has a /2-bifurcator of 
size O(VN). Similarly, if a class of graphs has a balanced separator theorem of size c* with 
a > 1/2, then every N-node graph in the class has a V2-bifurcator of size O(N). 


The converse is not true even if we consider only bifurcators whose corresponding decom- 
nm trocs arc balanced so that every graph is decusipused into eyual-size subgraphs. ror 
example, the N-node graph Sy defined in Scetion 2.3 has a balanced V2-bifurcator of size 
O(/Nig N) but the smallest separator for this class of graphs is Q(«/ 1g? z). 

When translated into bounds on layout arca, this seemingly minor difference between 
bifurcators and separators is greatly magnified. Graphs with small layout area always have 
small V2-bifurcators, but do not always have small separators. This is formalized in the 
following lemma. Later on we will prove the converse: graphs with small /2-bifurcators always 


have small layout area. 


Lemma 4.4. If a graph G can be laid out in area A, then G has a (VA, V2)-bifurcator. 


Proof. Consider a vertical cut of length VA through the center of the layout. Next, cut 
each of the sublayouts horizontally through the center. Continuing this sequence of alternating 
vertical and horizontal cuts, it is casy to see that at the 7th step no more than VA/Qtt/?! cdges 


are cut from each subgraph. This sequence of cuts yields a (VA, V2)-bifureator for G. | 
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4.2.1. Special Cases 


Many graphs have decomposition trees in which the number of cults decreases very slowly 
as we go lower down the tree. In such cases the number of cuts at higher levels of the tree may 
be very small. On the other hand, in decomposition trees corresponding to bifurcators, the 
number of cuts permitted decreases smoothly as we go down the tree. It is conceivable then, 
that the bifurcator permits far more cuts at higher levels than are necessary. For example, 
N-node binary trees have decomposition trees of height O(lg N) in which no more than | cut 
is required at every level. Since the minimum bifureator is at least WN, the decomposition 
tree corresponding to the bifurcator allows far more cuts at the top levels than needed. 

Similarly, some graphs have decomposition trees in which many cuts are required at the 
top levels, but this number decreases very quickly as we go down the decomposition tree. In 
such cases, the minimum bifureator is large so that. decomposition trees corresponding to the 
hifureator do not underestimate the number of cuts required at the top level. However, thoy 
do greatly overestimate the number of cuts at lower levels. 

lt is useful to separate such extreme cases from a general discussion. Of course, general 
upper bounds are valid for graphs with extreme decompositions, but they may overestimate 
the true bound. A particularly important reason for separating these classes is that many 
computationally useful graphs such as binary trees fall into the first category while cube- 
connected-cycles and multidimensional meshes fall into the second category. 

An N-node graph is defined to have a type A V2-bifurcator if it has an (O(VN), V2)- 
bifureator such that no more than O((N/2*)*) cuts, @ < 1/2, are required for cach partition 
at the zth level of the associated decomposition tree. Observe that at the higher levels of the 
tree, 2 << lg N, the number of cuts is far Iess than the O(VN /2*/?) cults allowed by the usual 
bifurcator. 

Similarly, au N-node graph is defined to have a type B V/2-bifurcator if it has an (O(N), V2)- 
bifurcator, a > 1/2, such that only O((N/2')®) edges are cut in any partition at the 7th level. 
Observe that for the lower levels of the tree, 7 >> 1, this quantity is far smaller than the 
O(N® /2*/2) cuts allowed by the usual bifurcator. 


Vor simplicity, we will prove results only for general V2-bifurcators in this thesis. However, 
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whenever there is a significant difference, results for the special cases are stated separately. The 
proofs for these special cases are easily worked out, and closely follow the proofs for the general 


cases. 


4.3. Balanced Decomposition Trees 


Of particular interest to the layout results reported in this thesis are decomposition trees 
where at each step of the decomposition, the two subgraphs are nearly equal in size. ‘his section 
considers such balanced decompositions and gives an effective procedure for transforming an 
arbitrary decomposition tree into one that is balanced. 

Formally, a decomposition tree for a graph G is balanced if each subgraph G,, in the tree 
is the father of two subgraphs Gyo and Gy, such that the number of nodes in the subgraphs 
differ by at most 1. In addition, we say that a decomposition tree is fully balanced if it is 
baianced, and if for every subgraph G,, in the tree, the sct of edges connecting U — Gy to Gy 
is divided into two subsets of nearly equal size by the partition of Gy into Guo and Gy. (Here 
we allow the number of cdge connections in the two subgraphs to differ by a small constant, 
say 5. For the purposes of simplicity, however, we shall often ignore such small differences and 
assume that the nodes and conncctions are split evenly between the two subgraphs.) 

Somewhat surprisingly, any decomposition tree may be transformed into a fully balanced 
one al little or no cost. We prove this in the following theorem which generalizes earlier results 


in [9, 40, 41, 42]. 


Theorem 4.5. Let G be any N-node graph with an (Fo, Fi,..., ,)-decomposition tree 
T. Then G has a fully balanced (I, F'4,..., ig n)-decomposition tree, such that for 0 < 
a<IgQN, 


Fr = 6 F, 


s=t 


Proof. Let l be a forest of complete binary trees consisting initially of the decomposition 


tree T. Color the leaves of T with two colors according to whether or not the subgraph of G 


ur 
ot 


BALANCED DECOMPOSITION TRIES 


associated with the leaf is empty. Apply Lemma 4.3 (k = 2) tol’, removing the indicated nodes 
and edges of 7. ach node of 7 corresponds naturally to a set of edges of G, namely the edges 
whose removal splits the associated subgraph in two. Removing a node of 7 corresponds to 
Temoving this cutset of edges from G. Since no more than 2 nodes are removed from each level 
of f, the number of edges removed from @ in applying Lemma 4.3 does not exceed 2 ae ES; 
which is less than F’5. 

Further note that G is divided into two disjoint subgraphs of nearly-equal-size by the 
removal of these edges. Each subgraph, in turn, corresponds in a natural way to a subforest 
of complete binary trees in I. Consider one such subgraph Gg and color the leaves of the 


associated forest of complete binary trees 9 using six colors as follows: 


If the leaf corresponds to an empty subgraph, color the leaf with color 1. Otherwise, if the 
single node corresponding to the leaf is incident to exactly 7 edges of G removed earlier, 


0< 7 <4, then color the leaf with color 7 + 2. 


By applying Lemma 4.3 (k = 6) to 9, it is clear that Gg can be decomposed into two 
disjoint subgraphs Goo and Go, of nearly-equal-size such that the number of edges from G— Go 
to Gog is nearly-equal to the number of edges from G— Gop to Gp;. Since at most 6 nodes were 
removed from each level of Ip and since Ig does not contain the root of 7, we can conclude 
that no more than 6 a F, = F‘, edges were removed from Go. 

By applying the above argument recursively, the desired fully-balanced decomposition tree 
is obtained. With each application of Lemma 4.3, the total number of leaves in each forest 
is cut in half at each step so that the biggest tree in any forest corresponding to a subgraph 
decreases in height by at least one. Also, lg N + 1 levels suffice since the size of each subgraph 


is also halved at each step. | 


Theorem 4.6. Every graph with a V2-bifurcator of size F has a fully balanced \/2-bifurcator 
of size 6(2 + V2)F. 


Proof. Immediate from Theorem 4.5, since )>..5 2-2 a4 V2. | 
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Figure 4.3: The 4 X 4 tree of meshes Ts. 


Remark. The procedure described in Theorems 4.5 and 4.6 can be implemented in polynomial 


time. 


4.4. Embeddings in the Tree of Meshes 


Leighton [40, 41] introduced the tree of meshes as an example of a planar graph that cannot 
be laid out in linear area. He also showed that every N-node planar graph can be embedded in 
an O(N lg N)-node tree of meshes. In this section, we define the tree of meshes and describe a 
general strategy for embedding a graph in the tree of meshes. 

The tree of meshes is formed by replacing each node of a complete binary tree with a mesh 
and each edge by several edges which connect meshes at consecutive levels. More precisely, the 
root of the complete binary tree is replaced by an m X n mesh (it is assumed that n is a power 
of 2), the nodes at the second level are replaced by n X n/2 meshes, those at the third level 
by n/2 X n/2 meshes, and so on until the leaves of the tree are replaced by 1 X 1 meshes. As 
shown in Figure 4.3, each edge of the tree is replaced with edges connecting nodes on one side 
of the higher-level mesh to the top row of the mesh at the lower level. The resulting graph is 
called the n X n tree of meshes 7,. It is not difficult to sce that T,, has N = 2n?Ign-+ n? 
Sodan: , 

In many cases, we use only the top levels of the tree of meshes. The subgraph consisting 


of levels 0, 1,...,p (p < 21g N) of T,, is called a truncated tree of meshes Tn,p- 
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Theorem 4.7. There is a constant c such that every N-node graph G with an (F, V2)- 
bifurcator can be embedded in Terjzig x Moreover, the embedding 13 regular in the sense 
that '?/N nodes of G are embedded in a regular fashion each of the N?/I? bottom-level 


meshes of Ter og N° 
N 


Proof. We first use Theorem 4.6 Lo construct a fully-balanced V2-bifurcator of size 6(2+ 
V2)F for G. We then use the internal meshes of Ter,2ig % to route the edges that were removed 
in the upper 2lg x levels of the fully balanced decomposition tree for G. The subgraphs in 
the (21g 4 )th level of the decomposition tree (each of which has [/°'?/N]| or ["?/N] nodes) are 
then embedded in the meshes on the bottom level of the truncated tree of meshes. 

The internal meshes are used as restructurable permuters. As we saw in Section 3.7, 
terminals on opposite sides of a mesh can be connected in any order through the mesh. In 
general, if the number of wires routed through a mesh does not exceed any side-length of 
the mesh, 9 ranting may always be found, Similarly, a graph with A@ nedes can always be 
embedded in a 4Mf X 4M mesh with nodes placed in a regular fashion. 

Consider only the top 2lg x +1 levels of a fully balanced decomposition tree for G. Each 
of the subgraphs at level 21g 4 of the decomposition tree has N(1/2)2'8 * == I’?/N nodes. 
(or simplicity we shall assume that F'?/N is an integer.) Furthermore, if 2; is the maximum 
number of edges between G — G; and G;, where G; is a subgraph in the decomposition tree at 


level z, then it is easy to see that yg = 0 and by Theorem 4.6, that 


1 F 
~ fz, + 6(2 + ¥2)-———— 
2 : ( ean 


fort <i < 2Qlg x. Solving the above recurrence, we obtain: 


E; < 62+ V2) > (v2/2)*, 


9(t--1)/2 s>0 


and thus 


Bz, < 6(2 + I, 
9(i—-1)/2 
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We now embed G in Tea 1g % . First, embed cech of the (21g %)-level subgraphs of the 
decomposition tree in the bottom level meshes. This can be done if the side of each mesh at 


level 21g ¥ excecds 4?/N. This is true provided 
2ig Xt 2 
cF [V2 > 4F?7/N. 


Yor c > 4, this inequality is easily satisfied. 
Next embed the additional edges through the upper-level meshes in the natural way. No 
more than 2/£;4,, edges pass through any ith level mesh. Thus the routing can be performed 


if the smaller side of the zth level meshes exceeds 2,41. In other words, we must have: 
ek /ol/4) > 49(0 44/9)? pr /98t, 
A simple calculation shows that the inequality is satisfied for sufficiently large ec. | 


Remark. Throughout the thesis, we express bounds using the term lg XN. For all practical 
purposes, / is much smaller than N and this term is greater than one. Should the value of 
i be larger, however, we shall still define lg N to be at least one. Similar interpretations are 
assumed for lglg Xv and for Ig lglg y, The conventions avoid the annoying (and trivial) cases 


when F is very large without complicating the analysis further. 


In the preceding embedding, all the nodes of G were mapped to meshes at the bottom level 
of the truncated tree of meshes. Thus, edges between nodes in dilferent meshes might have to 
be routed through as many as 41g x meshes. Such long edges are undesirable for a variety of 
reasons. It is natural to ask whether an embedding can be found in which cach edge can be 


routed through fewer intermediate meshes. This is answered in the following theorem. 
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Theorem 4.8. There exist constants c and k such that every N-node graph C with an 

(I, /2)-bifurcator can be embedded in Ter aig x and such that no edge is routed through 

more than k intermediate meshes. 

Proof. We adopt a slight variant of the strategy used in the previous theorems. The 
balancing and embedding are done simultaneously and in the same manner as before, except 
at levels 0, k,2k,3k,... (where k is a constant specified later). At these levels, we embed the 
nodes that are incident to edges previously cut, and we cut the previously uncut edges incident 
to these nodes. Of course, this could triple the number of cut edges every k levels but if & is 
sufficiently large, this happens infrequently and is not harmful. At all other levels the procedure 
is the same as before, using 6 colors and Lemma 3 to partition the decomposition tree. The 
process terminates after 2lg x levels. 

As before, the embedding is accomplished by using meshes as switching boxes for routing 
edges We must ensure that the number of edeos routed through any mesh docs not cxeced 
side lengths of the mesh. The calcuiation is the same as before except that the number of cut 
edges is tripled at every kth level. Thus the recurrence for E; is 


ia 
9(i-1)/2 


1 
E;< 5 (8) Mit + 6(2 + V2) 
Ifere, we have (without loss of generality) increased number of cut edges by a factor of 3 initially 
and by a factor of 3!/* at each level instead of increasing the number of cuts by a factor of 3 
at every kth level. Solving the recurrence, we find 


a Saye, 
Be < 18(2 + V2) ys & av) 


“26-1072 5% 


or k > 4, the sum converges to a constant. The remaining analysis is the same as in the 


previous theorems except that the constants are larger. | 


Remark. It is worthwhile to point out here that Theorems 7 and 8 could also have been 
proved using Lemma 4.t instead of Lemma 4.2. The nodes of G would still be balanced in 


the decomposition tree but the cut edges could only be split 1/3 - 2/3 at cach decomposition. 
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Figure 4.4; The H-layout of the tree of meshes 


While this increases the value of the sum, it still converges to a constant. (This is because, for 
sufficiently large k, 2v7231/k < 1.) Hence, k and c would be larger but the statements of the 


theorems remain the same. 


4.5. Layouts for the Tree of Meshes 


Thus far we have considered only the problem of embedding graphs in the tree of meshes. 
How do we lay out the tree of meshes efficiently? Clearly, any layout for the tree of meshes 
also gives a layout for every graph that can be embedded within the tree of meshes. In this 
section we develop two different layouts for the tree of meshes. 

The first layout is a straightforward modification of the “H-tree” layout for complete binary 
trees [55]. The modified layout is obtained by expanding each node of the complete binary tree 
into a mesh of the appropriate size. Figure 4.4 shows this layout. It is easy to see that if S(F) 
denotes the side of the layout for Tr, then S(1) = 1, and 


S(F) < 28(F/2) + O(F), 


which gives S(F) = O(F lg F). This means that the area of the layout for Tr is bounded by 
O(F? lg? F). As shown in [40, 41], this bound is optimal. 
For truncated trees of meshes, such as considered in Theorems 4.7 and 4.8, a similar result 


holds. 
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Theorem 4.9. The truncated tree of meshes Ty 1, x has a layout of area O( I? |g? Ny). 


Proof. The obvious restriction of the H-layout to the top levels suffices. E 


Although the mesh edges in the iayout shown in Figure 4.4 have length 1, the edges between 
meshes can be quite long (nearly half the side of the layout). By pulling in meshes closer towards 
the top level, we can reduce the length of the longest edge considerably. This technique was 
introduced in Chapter 3 to produce minimax edge length layouts for trees, and generalizes to 
graphs with known bifurcators. This layout will later be used to find layouts with short edges 


for graphs embedded within the truncated tree of meshes. 


x can be laid out in area O(F? 1g” ) 


¥: 


so that mesh edges have length 1 and edges between meshes have length at most O(F lg ¥/ lglg Ny). 


Theorem 4.10. The truncated tree of meshes Tr 91g 


Proof. Consider the H-tree layout of a complete binary tree of height 2lglgig %, and 
having (Iglg 4%)? leaves. Expand each linear dimension by a factor 6 = O(F lg ¥/ lglg 4), so 
that each edge of the H-tree layout becomes a channel of width 3 and each node becomes a 
@ X @ square. The resulting area is (Giglg %)* = O(F? ig? X). 

Since the channels are much wider than the side of any mesh, we can stack many meshes 
within one channel. In particular, as seen in Figure 4.5, we embed the top level mesh at the 
center of the layout with the second-level meshes on either side. In the first stage of the layout, 
the meshes in the top levels are placed together in a breadth-first manner. Meshes at successive 
levels are equally spaced at distance O(F Ig N/ Ig lg nN) apart. 

We need to ensure that every channel is wide enough to accomodate the meshes stacked 
within it. To this end, let us suppose that all meshes embedded in the first stage are stacked 
together in the same channel. Of course, this is a gross overestimate, but suffices for our 
argument. Since the path from the root to a leaf in the original (Iglg N)?-leaf H-layout has 


. are embedded in the first stage. The 


oe 


length O(lglg %), a total of clgig ¥ levels of Tyr aig 
value of the constant c depends on the values of the other constants in the O-terms and can 
be made as small as necessary. 


The total number of meshes embedded in the first stage is no more than 2!+°'8!8 ®, Each 
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Figure 4.5: An improved layout for the tree of meshes. 


mesh has side length no greater than F’, so to stack all these meshes within one channel of side 


Foltelele = < of a 
~~ Alglg # 


which is easily satisfied when c < 1/2. Hence every channel has sufficient width to stack all 


B, it suffices to have: 


the ith level meshes across the channel for any 1 < clglg #. 

In the second stage, we embed the remaining meshes in the 8 X f@ squares. A total of 
(Ig 4)</(Ig 1g 4)? copies of an O(lg %) level ie be ig hy7E truncated tree of meshes must 
be embedded in each of the (lglg %)? 8 X @ regions to accomplish this. Using the layout 


described in Theorem 4.9 for each copy, the total area required in each region is 


eR OF? i a(N)\_ of Pe 
of ¥)? (Ig Xe : (*)] 7 of iat “aI 


This is precisely the amount of area available in each @ X (6 region. Hence the embedding is 


possible. 

It remains to verify that the edges between meshes have length O(F lg %/lglg %). This 
is easily done since meshes in adjacent levels were spaced distance O(F lg ¥/ lglg 4) apart in 
‘the first stage, and since meshes in adjacent levels were located in the same § X @ region in 


the second stage. | 


CHAPTER 5 


Solving the Layout Problems 


Using the framework described in the previous section, we are now ready to present general 
solutions to the eight problems posed in Chapter 2. The layout framework of Chapter 4 applies 
directly to most of these problems, supporting our belief that the divide-and-conquer strategy 
based on bifurcators is an efficient paradigm for VLSI graph layout. In particular, the tree of 
meshes emerges as an extremely versatile network for graph layout. While specific instances 
of some problems might be better solved using different techniques, the framework provides 
a novel and uniform approach for VLSI layout which effectively addresses various unrelated 
issues. The solutions presented in this section are evaluated by comparing them with known 


lower bounds. 


Problem 1. Given a graph C, produce an area-efficient layout for G. 


By Theorem 4.7, every N-node graph with an (/°, V2)-bifurcator can be embedded in the 
truncated tree of meshes T'o(p),21¢ ¥.- Next, by Theorem 4.9, the truncated tree of meshes can 
be laid out in O(J?? Ig? N) area. Therefore, every N-node graph with an (/, ¥2)-bifurcator 
can be laid out in O("? lg? X) area. 

As a consequence of Lemma 4.4, every N-node graph whose smallest W2-bifurcator is F, 
must occupy at least F? area. lor otherwise the graph would have a V2-bifurcator strictly 
smaller than #’. Therefore, for every graph the upper bound is at most a factor of Olle? N) 
worse than optimal, i.e., the area bound is universally close to optimal. 


' The bounds are also ezistentially optimal. Leighton [7, 42] has shown the existence of 


N-node graphs with minimum V2-bifureator f° which require area at least A(N Ig? N). In 
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other words, no strategy based on bifurcators alone can asymptotically improve upon the 


divide-and-conquer framework. 


Special Cases. Graphs with (F, /2)-bifurcators with cither of the special forms described 
in Section 4.2.1 have O(/*?)-area layouts. Thus, for example, N-node trees have O(.N)-area 


layouts. 


Problem 2. Given a graph G, produce an area-efficient layout for G with minimaz edge 


length. 


From Theorem 4.8 we know that every N-node graph with an (f°, V2)-bifurcator can be 
embedded in the truneated tree of meshes ToF),2 1g x, SO that no edge passes through more than 
a constant number of intermediate meshes. Furthermore, the layout for the truncated tree of 


meshes given in Theorem 4.10 guarantees that every edge between meshes has length bounded 


by OOF Ie ae be bee wy and thatevery cdes within va h has longth onc. Combining these two 
theorems, we sce that every N-node graph with an (/', a bifurcator has an O(I’? Ig? “)-area 
layout with maximum edge length bounded by O(F lg ¥/ lglg N). 

This bound, too, is existentially optimal [7]. In other words, there exist N-node graphs 
with minimum V2-bifurcator / whose minimax edge length is A(F lg & -/ lglg nN). 

Unfortunately, the bounds are not universally close to optimal. The only gencral lower 
bound on minimax edge length for N-node graphs whose minimum V2-bifurcator is F’, is 
Q(?/N). This general lower bound is also existentially optimal. 

The problem of minimizing maximum edge length appears to quite difficult. Although the 
preceding bounds are disappointingly weak, they are the best known. Recall that in Chapter 


3 we showed that even deterinining if a tree can be laid out with minimax edge length one, is 


NP-complete. 


Special Cases. The minimax edge length bounds for graphs with spccial (F, V2)-bifureators 


are O(/N/ Ig N) for type A V2-bifurcators and O(/) for type B V2-bifurcators. 
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Problem 3. Given a graph, produce an area-efficient layout in which each wire has 


bounded delay in the capacitive model. 


First we formalize some details of the model. As usual, a graph describes a connection of 
processors, with an edge corresponding to a bidirectional link between two processors. [ach 
node is a processing clement which contains one driver and one receiver for cach incident edge. 
Every transistor in a processing clement has the same size. Thus, in our layouts, a node may 
be represented by a long and skinny box of constant thickness, with length equal to the area 
of an internal transistor. Since each node has bounded degree, a box will be just big enough 
to contain all the transistors in the corresponding processor. Note that different nodes in the 
layout will have different lengths, but the same thickness. We assume that the grid spacing is 
adjusted so that nodes and edges have unit thickness and may be laid along grid lines. Although 
wires are allowed to cross, we will not allow nodes to cross; this corresponds to transistors not 
overlapping. Sitdlarly, wires aid iiudts anay iivt cross. The propagation dulay over a wire of 
length / driven by a transistor of area D with capacitive load A is proportional to (1+ A)/D. 
The capacitive load presented to a transistor equals the sum of incident wire lengths and areas 


of adjacent transistors, 


Theorem 5.1. Every N-node graph G with an (F,V2)-bifurcator has a bounded-delay 
layout of area O(lr? ig” y). 


Proof. Asin Theorem 4.8, embed G in a tree of meshes so that adjacent nodes are mapped 
to meshes no more than a constant number of levels apart. Since the dimensions of meshes at 
successive levels, as well as the lengths of edges connecting adjacent meshes in the layout of 
Theorem 4.9, decrease at the same geometric rate, we know that the length of an edge of G is 
proportional to the side lengths of the meshes that contain the corresponding nodes. Assign to 
each node an area that is proportional to the side lengths of the mesh in which it is embedded. 
Thus, the capacitive load on any node, which equals the sum of the areas of all the incident 
edges and adjacent nodes, is proportional to the area of the node. In other words, every wire 


in the layout has bounded delay. 
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Figure 5.1: Laying out expanded nodes in a mesh. 


We need to ensure that each enlarged node can be accomodated in its assigned mesh 
without blowing up the area of the layout by more than a constant factor. This can be done 
by increasing the dimensions of each mesh by a constant factor, and laying out the nodes and 
incident edges as shown in Figure 5.1. Notice that the nodes do not overlap other nodes or 
wires. The area of each node remains proportional to the side lengths of the mesh containing 


it, and thus the delay across every wire is bounded. | 


Special Cases. Similarly, graphs with special (F, /2)-bifurcators have O(F?)-area bounded- 
delay layouts. Thus, for example, every N-node tree has an O(.N)-area bounded-delay layout. 


Theorem 5.1 implies that the area bounds for bounded-delay layouts are no worse than 
the best known general area bounds for Problem 1. However, it is not known whether or not 
there exists a graph for which any bounded-delay layout requires asymptotically greater area 
than the minimum area layout. In the following corollary, we show that any increase in area 


need not be large. 


Corollary 5.2. Any layout of area A for an N-node graph can be transformed into a bounded- 
delay layout of area O(Alg? YH). 


Proof. By Lemma 4.4, an area A layout yields a (WA, V2)-bifurcator which can be quickly 
found. Next, by Theorem 5.1, a bounded-delay layout of area O(A lg? vA) can be easily 


constructed, Observe that this transformation is effective. | 
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Problem 4. Given a graph G, produce a layout for GC with few wire crossings. 


The layouts for the truncated tree of meshes in Theorems 4.9 and 4.10 do not have any edge 
crossings. Since every N-node graph G with an (J, V2)-bifureator can be embedded within the 
truncated tree of meshes To(ir),91g x, this means that the number of crossings in the layout for 
G cannot excced the number of nodes in Toc 21g N.. In other words, the number of crossings 
in the layout for G is bounded by O(F? lg 4). 

Once again, this bound tco is existentially optimal [7]. Moreover, if the minimum V2- 
bifurcator F of an N-node graph is asymptotically greater than WN, the number of crossings 


in the layout for G is no more than a factor O(lg N) times optimal. 


Special Cases. Graphs with special (/’, V2)-bifurcators can be laid out with O( IF?) crossings. 


Problem 5. Given a graph, produce an area-efficient regular layout for the graph. 


In Theorem 4.7, we showed how to embed any N-node graph G with an (J°, /2)-bifurcator 
in Tero ig N for some constant c. Moreover, the nodes of G were divided evenly among the 
N*/F? bottom-level meshes of Ter ,21g% and in each bottom-level mesh, the nodes of G were 
embedded in a regular fashion. Thus to produce an O(F? lg? N)-area layout for G that is 
regular, we need only produce a layout for Tef,21g x for which the nodes at the (21g 4)th level 


are located in a regular fashion. In fact, we can do much better, as we show in the following 


theorem. 


Theorem 5.3. The truncated tree of meshes Tocr),21¢ nN can be laid out in O(F? lg? y) 


area so that, for every level 1, all nodes within ith level meshes are placed in a regular 


fashion. 


Proof. The first step is to construct a O(lg #)-layer three-dimensional layout [46] of the 
truncated tree of meshes. Fold the connections between the root of the tree of meshes and 
each of its two sons so that the sons fil naturally on a second layer over the root mesh. Fold 
the connections to each of the meshes at the next lower level so they fit, on the third layer, 


directly over the meshes on the second layer, and so forth. This generates a Ig N layer three- 
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dimensional layout, with each layer occupying linear area. By projecting the three-dimensional 
layout onto the plane in the manner of Thompson [80, pp. 36-38], the result follows. (The 


same Jayout can be constructed by interleaving the meshes at cach level.) | 


Special Cases. The O(F?)-area layouts for graphs with special V2-bifurcators are also regular. 


Problem 6. Design area-efficient chips that can be configured to realize a large number 


of graphs. 


In Theorem 4.7 we showed that every N-node graph with an (F, V2)-bifureator can be 
embedded in a truncated tree of meshes such that the nodes of the graph are embedded in a 
regular fashion in the bottom-level meshes of T. 91, Ne In fact, the nodes can be mapped to 
fixed positions within the meshes. Therefore, if we lay out the truncated tree of meshes on a 


chip with processors at these fixed positions, we have a configurable chip for all graphs with 


for configurable layouts are the same as for unrestricted layouts. 


Theorem 5.4. Every N-node graph with an (I’, /2)-bifurcator has a configurable layout 
of area O(F? lg? X). 


Proof. Simply make the connections in the meshes after the rest of the chip has been 


fabricated. Recall that we used the meshes as crossbar switches in Theorem 4.7. a 


Special Cases. Similarly, graphs with special bifurcators have O(F?)-area configurable layouts. 


The O(N)-area restructurable tree layout of Chapter 3 is such an example. 


Problem 7. On a wafer which has arbitrarily distributed defective cells, realize a given 


graph on the good cells. 


Theorem 4.7showed how to embed any N-node graph G with an (F, ¥2)-bifurcator in the 
truncated tree of meshes Top) 2 1g N. The embedding had the property that nodes of the graph 
could be mapped to fixed positions within the meshes at the boltom level. Accordingly, we 


fixed processors at cach of these positions. 
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Faulty processors on a wafer therefore correspond to faulty processors in the truncated tree 
of meshes, the correspondence being induced via the layout for the tree of meshes. It is clearly 
no longer possible to realize G in the faulty tree of meshes. However, it is possible to realize a 
smaller graph with a similar structure using only the functioning processors. 

More formally, consider a class of graphs for which any N-node graph in the class has a 
V2-bifurcator of size O(f{(N)) where the function f is such that f(z)//z is nondecreasing for 
increasing z. For example, f(z) = Vz for the class of square meshes (as well as for the class of 
trees or the class of planar graphs). In what follows, we will show how to embed any M-node 


graph from the class in any T.,;n7) 21g that has M functioning processors where N > M 


N 
FUN) 
and c is a sufficiently large constant. 

In particular, we will show how to embed Ty;4) 21g atts 1D the faulty tree of meshes. By 
applying Theorem 4.7 to the smaller tree of meshes embedded within the faulty one, this will 
prove our claim. Thus the layout strategy developed in Chapter 4 is impervious to the existence 
of faulty processors. This result substantially generalizes and simplifies a similar result proved 


by Leighton and Leiserson for embedding meshes around faults in [45]. 


Theorem 5.5. Given the preceding constraints on N, M,c and f, a completely functioning 


truncated tree of meshes T5(y4),21g with M processors can be embedded in any partially 


M. 
S(M) 


functioning truncated tree of meshes To (N),218 785 with N processors (M of which are 
functioning) so that the processors of the former are mapped onto the functioning processors 


of the latter. 


Proof. Label the functioning processors in each tree of meshes from 1 to M by counting 
from left to right across the bottom level of each graph. (Recall that the processors are 
evenly distributed on the bottom level.) Map the kth processor of Ts (M),218 $45 onto the 
kth functioning processor of Tes(n),2 1675" Route the edges of the former graph through the 
meshes of the latter in the usual way, at the same time embedding meshes of the former in 


blocks within the meshes of the latter. 


It remains to show that the capacity of each mesh in T.y(n),21¢ 7h is sufficient for the 
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embedding. Consider a mesh X on the 2th level of Ty pcny 21g . This mesh has side lengths 


ae 
cf(N)/2*/? and at most N/2* functioning processors slaw ak in the bottom level of the 
graph, The only meshes and edges of T yyy) 91g 3th; that are embedded in X are those that 
correspond to roots of the forest of complete binary trees formed by removing the corresponding 
interval of (at most N/2*) processors in Ty(M),2 lg 7s’ These roots are identified by splitting 
Ty (M),2 1g 7 (as in Lemma 4.3) at the two endpoints of the interval. There are at most two 
roots at each level in the resulting forest and the sum of their side lengths (a geometrically 
decreasing sum) is proportional to f(M)/2?/? where j is such that M/27 < N/2'. (Remember 
that there are at most N/2' processors in the leaves of the forcst so that the height of the 
largest complete binary tree in the forest is 7 where M/27 < N/2'.) Thus the sum of the side 
lengths of the meshes embedded in X is of Hie) which, for sufficiently large ec, is less 
than ef(N)/2*/? (this is the side length of X), since N > M and f(z)/V/z is a nondecreasing 


function. Hence X is large enough and the embedding is possible. | 


Special Cases. A similar argument works for graphs with special bifurcators. 


Problem 8. Given a graph G, assemble G using the minimum number of copies of a 


single chip having few external pin connections. 


Suppose that we wish to assemble N-node graphs with (/, V2)-bifurcators but that each 
chip contains only m nodes, where m < N. Consider a chip consisting of a truncated tree 
of meshes 1 vere Vizwy,) With the m processors divided cqually among the bottom-level 

OCFF Olle “EE ) 
meshes, and external pin connections to the top of the top level mesh. Two copies of this chip 
may be wired together to form a truncated tree of meshes with 2m processors. .Thus, graphs 


with twice as many processors can be assembled with two chips than can be assembled on a 


single chip. More generally, we have the following result. 
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Theorem 5.6. There is a universal restructurable chip with m processors and o(2) 


external pins, occupying area O( Ex Ie” Via y. such that every N-node graph with an 


(2; V2)-bifurcator can be assembled using multiple copies of the universal chip. Furthermore, 


the number of chips used in the assembly is the minimum possible. 


Proof. Consider the top lg N — lgm levels of a fully balanced decomposition tree of 
G. Fach of the subgraphs at level lg N —lgm has N/2'§N—'s™ = m nodes, and has a V2- 
bifurcator of size O(-GRF). By Theorem 4.7, cach of these subgraphs can be realized with a 


single universal chip consisting of a truncated tree of meshes 7 a View, Whose area is 
OCF) Ollg SEM) 


bounded by o( Eye 1g? van), and which has oe) external pin connections. To complete 


the assembly, the chips are wired up by making connections between pins on different chips as 


given by the decomposition tree. | 
A noteworthy consequence of this result is that when F — O(/N), the sestiuictirable chip 


has O(,/m) pins, which is independent of the size of the network to be assembled. This is the 
best possible. To realize networks with larger bifurcators, the parameters of the restructurable 


chip depend on the size of the network assembled. 


Special Cases. Vor graphs with special bifurcators, the same is true except that only O(F?) 
area is used on each chip. For type A V2-bifurcators, the number of pins needed is much lower. 
For example, N-node trees require only O(lgm) pins per chip (Theorem 3.9). As is the case for 
all planar graphs, the number of pins does not depend on the number of nodes. This is because 


N-node planar graphs have V2-bifurcators of size O(WN).) 


CHAPTER 6 


The Channel Routing Problem 


While the layout problems considered in Part I provide new insights and paradigms for 
VLSI graph layout, they are nevertheless abstractions of problems encountered by current 
automatic layout systems. In this second part (Chapters 6 and 7) we shall study the widely en- 
countered channel routing problem which forms the basis of a popular paradigm for automatic 
layout. 

The typical routing problem is characterized by asct of rectangular modules with terminals 
at fixed positions along module boundaries. Labels on the terminals specify the required 
connections — all terminals with the same label must be electrically connected. The problem is 
to wire together all terminals that have the same label. 

Most layout systems proceed in two phases: placement.and routing. In the placement phase 
the modules are located at fixed positions, and the required connections are later made in the 
routing phase by running wires around and in between the modules. Of course, the two phases 
go hand-in-hand; a placement for which a complete routing is impossible is of little use. The 
intractability of obtaining optimal solutions in either phase demands that efficient heuristics 
be developed for practical use. 

Introduced by Hashimoto and Stevens in 1971 [34], channel routing has become a very 
popular and successful heuristic for routing integrated circuits. As illustrated in Figure 6.1, 
after the modules have been placed, the chip is heuristically partitioned inlo a set of rectangular 
channels, and cach channel is assigned a sect of wires which are to pass through it. This 
effectively reduces a difficult “global” wiring problem to a set of disjoint (and presumably 


easier), “local” channel routing subproblems. 
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Figure 6.1: Reducing the global wiring problem into a set 
of channel routing subproblems. 


The performance of the overall strategy is largely determined by the algorithm used to solve 
the individual channel routing subproblems. For this reason, the channel routing problem has 
been intensively studicd for over a decade, and many heuristic algorithms have been proposed 
for solving the problem [1, 2, 11, 12, 18, 20, 21, 34, 35, 36, 38, 51, 60, 62, 67, 68, 81, 84]. 
Although many of these heuristics have proved reasonably successful in practice, there are 
instances (albeit theoretical) when the heuristics either produce arbitrarily bad solutions or 
fai] to produce any solution. Chapter 7 presents a fast approximation algorithm which is 
guaranteed to produce a solution close to optimal. The remainder of this chapter, however, 
poses the problem in a formal framework and briefly reviews some of the previous work on 


channel routing. 


6.1. Manhattan Routing Within Channels 


The channel routing problem may be described as follows. A channel consists of a two-layer 
rectangular grid of columns and tracks (rows). Terminals are located on the top and bottom 
tracks at grid points. The number of tracks between the top and bottom tracks is the width of 
the channel. Each set of terminals to be electrically connected constitutes a net, and distinct 
nets are disjoint. A net with r terminals is called an r—point net. The width may be varied 
by moving the tracks vertically; however, the tracks are not allowed to slide horizontally. In 


other words, the columns are fixed. We also assume that there are no trivial nets (two-point 
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Figure 6.2: Manhattan routing within a channel. Vertical 
cults measure channel density. 


nets with both terminals in the same column). 

The objective of the channel routing problem is to wire together all terminals in each net 
in a way which minimizes channel width. Wires may be routed on either layer, along any 
track between the top and bottom tracks, and along any column. There is no restriction on 
the number of columns at either end. Electrically disjoint wires may cross at grid points on 
different layers, but may not overlap for any distance even on different layers. A wire may 
change layers at a grid point, in which case no other electrically disjoint wire may pass through 
that grid point on either layer. 

In the Manhattan wiring model, these constraints are satisfied by restricting all horizontal 
wire segments to lie on one layer, and all vertical segments to lie on the other layer. For a wire 
to turn a corner it has to change layers, which requires a contact cut. Clearly, distinct wires 
cannot share a corner since that would violate the constraint that only one wire may change 
layers at any point. For obvious reasons, Manhattan routing is also referred to as layer per 
direction or reserved layer routing. Figure 6.2 illustrates an example of Manhattan routing in 


a channel. 


Remark. The channel routing problem described above is a simpler version of switchboz routing 
in which terminals are located on all sides of a rectangular channel. In many instances, such 
as when two large modules are placed next to each other, terminals lie only along two opposite 
sides of a channel. For this reason, and because switchbox routing problem is much more 


difficult, enginecrs have focussed attention primarily on the simpler channel routing problem. 
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6.2. Bounds on Channel Width 


Consider a vertical cut which slices the channel in two (see Figure 6.2). Every net which 
has a terminal on both sides of the cut is said to be split by the cut. Sinee at least one wire 
must cross the vertical cut for each split net, it follows that at every point the channel must 
be at least as wide as the number of nets split by a vertical cut through that point. In short, 
channel width can be no less than channel density, which is defined as the maximum number 
of nets split by a vertical cut. For example, the channel of Figure 6.2 has density three. 

Can every channel with density d be routed in O(d) tracks? In practice, most channels can 
be routed in d plus two or three tracks. In general however, this is far from the truth. Brown 
and Rivest [14] gave examples of two-point net channels, with n terminals, whose density is 
one, but for which channel width can be no less than V2n. Since we shall employ an identical 


argument later, their result is rederived below. 


Theorem 6.1 (Brown-Rivest). Consider the two-point, n-net (shift-one) channel in 
which terminal i 1s located in column i on the top track, and in columni+1 on the botiom 


track. Any Manhattan routing for this channel must have width at least ~2n — 1. 


Proof. Suppose that a routing of width w is given. Since the top and bottom terminals 
of any net lie in different columns, each wire in the routing must use a horizontal track to 
change columns at least once. Now, if a wire changes from column i to column 7 along track y 
(1 < y < n) then either the vertical segment (7,4— 1) —(j, y) or the segment (7, y) — (7, y+ 1) 
can not have a wire laid on il. Otherwise, as scen in Figure 6.3, two different nets will overlap 
at point (j, y). 

In other words, whenever a wire changes columns within the channel, it must change to a 
blank column, one which has no wire in one incident vertical segment. A wire may also change 
columns by exiting across a side of the channel along a horizontal track. 

THlow many wires can change columns along the first horizontal track? Since all grid points 
on the top track are occupied, a wire can change colunins only by exiting the channel. But, 


since segment overlaps are prohibited, at most two wires can change columns in this way. 
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Figure 6.3: A wire can only turn into a blank spot. 


Observe that whenever a wire exits the channel, one blank segment is created along a column. 

The number of wires that can change columns on any horizontal track is bounded by 
the number of blank vertical segments incident to that track, plus two (for wires that exit the 
channel). If 2 wires change columns on the first horizontal track, this creates two empty vertical 
segments incident to the second track, so that 4 wires can change columns on the second track, 
and so on. In general, it is easy to see that the number of wires that can change columns on 
track y is at most 2y when y < |w/2] and at most 2(w + 1 — y) otherwise. 

Summing over all horizontal tracks, the total number of wires that can change columns is 


consequently no greater than 


dX, at SS Aw—y+)), 


0<y<|w/2) “| w/2J+1 


which is always less than 4(w + 1)?. Finally, since every wire connecting a net has to change 


columns, we have 


4(w+1)? >n, 


or, w > V2n — 1, thus proving the result. | 


An obvious question that arises is: Can every channel be quickly routed in minimum width? 
Unfortunately, the general problem is NP-complete [77], and remains. NP-complete even for 
two-point nets [77, 78). This might help explain why none of the current heuristics is even 


guaranteed to find solutions that are close to optimal. 
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Figure 6.4: In the knock-knee wiring model, two wires may 
share acorner as long as they remain on different 
layers. 


6.3. Bounds for Other Wiring Models 


While Manhattan wiring rules ease the task of mask fabrication, less restrictive wiring 
models are also occasionally used. For example, some manufacturers may permit wires to 
change direction within a layer, or may allow non-rectilinear wiring. Similarly, other manufac- 
turers may provide more than two layers of interconnect. It is important to consider how 
variations‘in the wiring rules affect the routability of channels. 

In the knock-knee wiring model, wires are allowed to change direction within a layer, and 
wires on different layers may share a grid point as long as neither one changes layers at that 
point. The routing illustrated in Figure 6.4 is permissible in the knock-knee model, but not 
in the Manhattan model. Channel density of course remains a lower bound on channel width. 
Rivest, Baratz, and Miller {67] investigated the channel routing problem under the knock-knee 
wiring model. They showed that every two-point net channel with density d can be routed in 
width 2d — 1, independent of the number of nets. In view of Theorem 6.1, this implies that 
the knock-knee wiring model is more powerful than the Manhattan wiring model. Leighton 
[43] gave a construction for channels with density d which cannot be routed in less than 2d—1 
tracks, so that the Rivest, Baratz, and Miller algorithm is optimal in the worst case. For 
multi-point net channels, their algorithm guarantees a routing of width at most 4d — 1. 

Preparata and Lipski [62] consider the channel routing problem under the knock-knee 
model, but with three layers of interconnect instead of only two. With this extra layer, they 
guarantee that every two-point net channel with density d can be optimally routed using exactly 


d tracks. Moreover, this routing can be accomplished quickly. For multi-point net channels, 
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their algorithm guarantees a routing of width no greater than 2d. 

The problem of “river routing,” which is single-layer channel routing, has also received 
considerable attention [21, 23, 51, 81]. Under the single-layer restriction, there exist fast 
algorithms for channel routing. In particular, Leiserson and Pinter [51] also examine the 
problem of placing movable modules along the top and bottom tracks so as to minimize the 
horizontal “spread” and width of a channel. Pinter [61] also studics the problem of river routing 
within polygonal regions with terminals along the perimeter of the polygon. Finally, LaPaugh 
[39] studies the problem of wiring terminals placed along the perimeter of a rectangular module 


where the wires are on two layers, but are restricted to lic outside the module. 


CHAPTER 7 


An Approximation Algorithm for Manhattan Routing 


Brown and Rivest’s lower bound for the one-shift example indicates that channel density is 
not the only fundamental limitation on channel width. Motivated by their argument, Section 
7.1 introduces the concept of channel fluz, which provides another fundamental limitation 
on channel width. Unlike density, flux is a local phenomenon and captures the amount of 
“congestion” within a channel. 

Flux and density together completely characterize the difficulty of Manhattan routing. 
Section 7.2 presents a linear-time algorithm which routes every two-point net channel in width 
proportional to its flux and density. This scttles a conjecture of Brown and Rivest that their 
lower bounds are tight to within a constant factor. Moreover, in practice, flux is extremely 
small so that the algorithm for two-point nets uses no more than a constant number of tracks 
more than density. Section 7.3 analyzes the running time of the algorithm, while Section 7.4 


extends the algorithm to multi-point net channels. 


7.1. Channel Flux 


While channel density provides a fundamental limitation on channel width, it fails to 
capture the local congestion inside a channel. For example, while the one-shift channel has 
low density, the channel width must nevertheless be large to overcome congestion within the 
channel. This congestion arises from the fact that every column in the top track contains a 
terminal whose mate lies in a different column along the bottom track. Since wires in adjacent 


columns may not both “turn right” along a common track without colliding, many horizontal 
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Figure 7.1: The modified one-shift channel can be routed 
in width two. 


tracks are needed to complete the wiring. 

In striking contrast, consider modifying the one-shift channel by making every alternate 
column blank. While this channel is globally similar to the one-shift, it can be routed using 
only two horizontal tracks as shown in Figure 7.1. This channel is not locally congested because 
the empty columns enable many wires to simultaneously turn along the same horizontal track. 

We now introduce the concept of channel flux to measure congestion. Although there are 
a variety of ways to measure congestion, we choose here a simple definition which permits a 
clean analysis. In Section 7.4 we vary the definition slightly to obtain better bounds. 

Suppose that instead of making vertical cuts in the channel, we instead make a horizontal 
cut which isolates a set of contiguous columns from one track. Observe that we can vary the 
size of a cut (measured by the number of columns within the cut) as well as its position. As 
before, we say that a net is split by a horizontal cut if it contains terminals both within the 
cut and outside. For any given position of a cut we can measure the number of distinct nets 
split by the cut. 

Intuition suggests that the greater the number of distinct nets split by a cut, the greater 
the congestion is within the cut. Moreover, the larger the size of a congested cut, the larger 
the channel width, because if the region of local congestion is very large, then so is the overall 
global congestion of the channel. This intuition is formalized below. As mentioned earlier, we 


restrict attention only to channels which do not contain any trivial nets. 


Definition. The fluz of a channel is the largest integer f for which there ezists a horizontal 


cut of size 2f? which splits at least 2f? — f nontrivial nets. 
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For example, the one-shift channel has flux 0(./n) because a horizontal cut of size n which 
isolates the top track splits n nets. Similarly, the modified one-shift of Figure 7.1 has flux one. 
For the flux to equal two there must be a cut of size 8 which splits at least 6 nets, but since 
every alternate column in either track is blank no such cut exists. 

Using Brown and Rivest’s argument for the one-shift channel, we next show that flux is 


indeed a lower bound on channel width. 


Theorem 7.1. Every channel with density d and flur f requires channel width at least 


maz(d, f). 


Proof. Find a horizontal cut of the channel which spans 2f* columns and splits at least 
2f? — f nontrivial nets. For each nontrivial net split by the cut, choose any two terminals 
from different columns that lie on opposite sides of the cut. 

Consider the channel formed by the set of chosen terminals, i.e., assume that all columns 
which do not contain a chosen terminal are blank. This new channel consists of at least 2f? — f 
nontrivial two-point nets. Moreover, at most f of the 2f? columns spanned by the original cut 
may be empty. By the same argument used to prove Theorem 6.1, no more than f + 2 of the 
nontrivial nets can be routed into the correct column on the first track: f into empty columns 
and one out each side of the cut. After the first track, there are at most f + 2 empty columns, 
the extra two having possibly been created by wires exiting across the side of the cut in the 
first track. Thus, at most f + 4 nontrivial nets can be routed into the correct column on the 
second track. In general, at most f + 27 nontrivial nets can be routed into the correct column 
on the ith track. 

Let w be the minimum width for which a wiring exists. By the preceding argument, the 
total sia Ber of nets that can change columns anywhere in the channel is no greater than 
Dye ua(f + 27) = wf + w(w + 1). But since at least 2? — f nontrivial nets must eventually 
be routed, it follows that wf + w(w +1) > 2f? — f, or w > f. Thus the original problem 
requires a channel of width at least f. Finally, since the density d also is a lower bound on 


channel width, the Theorem follows. | 
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Flux is negligibly small in practice, and for all purposes never exceeds three or four. One 
explanation for this is that terminals are movable; it is good enginecring practice to leave 
enough empty space so that if the channel is congested, then the terminals can be moved 
slightly to allow a belter wiring. Moreover, many columns contain less than two terminals, 
and a large fraction of nets contain terminals that are close together on the same side of the 
channel. These are precisely the conditions that make flux small. Finally, unlike density, flux 
is a local phenomenon and is less likely to grow with the size of a channel or the total number 
of nets. As an example, Deutsch’s “dificult problem” [20] has 72 nets, 174 columns and density 


19, but the flux is just 3. 


7.2. An Approximation Algorithm for Top-to-bottom Nets 


{n this section we present a linear-time approximation algorithm for routing channels with 
two-point nets. It is assumed that each net is nontrivial and has exactly two terminals, one each 
on the top and bottom tracks. The next section extends this algorithm to gencral multi-point 
net channels. 

The input to the algorithm may be presented in one of two ways. Ut might consist of a list 
of columns, each entry describing the terminals in the top and bottom tracks in that column 
(possibly none). A more compact representation is a list of nets, each net itself being a list 
describing the positions of terminals in that net. The algorithm outputs a detailed wiring of 
the channel. The length of the output is proportional to the total wire area used to route the 
channel. 

The cunning time of the algorithm will be measured as a function of the shortest possible 
output. This is more reasonable than measuring time as a function of the length of the input 
because the length of the output is always at least as large as the length of the input. In fact, 
the output is generally much longer than the length of the input. 

With this convention for measuring the running time, it is straightforward to see that cither 
input representation described above may be converted to the other in linear time. Morcover, 


if the total number of columns in the channel is c, and if the channel has flux f and density d, 


esti. Ace soa ita 
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PHASE 2 


Figure 7.2: The regions routed in each phase. 


the minimum area required to route the channel is at least N(c(d+ f)). The running time of 
our algorithm is bounded above by O(c(d+ f)), so that it is a linear-time algorithm. 

The algorithm proceeds in four phases. Figure 7.2 sketches the regions routed within the 
different phases. The first two phases distribute empty columns uniformly across the channel, 
thereby dividing the channel into blocks each containing a small number of empty columns. 
This creates a new channel routing problem with possibly higher density, but with reduced 
flux. The third phase, the heart of the algorithm, routes the correct number of wires between 
blocks, without worrying about which columns within a block these wires lie in. Finally, the 
fourth phase routes the wires within each block into the correct column. The empty columns 
within each block allow a block to be wired independently of other blocks, so that every block 


is wired simultaneously on the same horizontal tracks. 


The Top-to-bottom Channel Routing Algorithm 


Phase 1: Partition the channel into groups. 


Find the least integer k such that the channel can be partitioned into groups of k? 
consecutive columns, each group containing at least 3k empty grid points in both the top 
and bottom tracks. (An empty grid point is one at which no terminal is placed.) This 
can be accomplished by trying successive values for k (starting with 1,2,3,...) until the 


constraint is satisfied. 
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The definition of flux guarantees that k does not exceed 6(f + 1). For, suppose that 
k = 6(f +1) does not satisfy the constraint. Then some group of 36(f + 1)? columns 
contains less than 18(f + 1) empty grid points on one track. If we partition this group 
into 18 blocks, each of size 2(f 4-1)?, then one of thern must have less than (f +1) empty 


grid points on one track. But this means that the flux is at least f + 1—a contradiction. 


Phase 2: Distribute empty points uniformly. 


Divide each group of k? columns into k blocks of k columns each. Route wires from the 
first 3 points (if non-empty) on the top track of each block into columns that are empty 
on the top track. Since each group has at least 3k empty points on the top track, this 
routing can be easily accomplished using no more than 3k horizontal tracks. Repeat the 
same for the bottom track, so that the original channel is reduced to one which can be 
partitioned into blocks of size k such that the leftmost 3 columns of each block are empty. 
The significance of having 3 empty points in each block will be made clear in the detailed 
interblock routing of Phase 3. Observe that although the density of the resulting channel 


may be greater than the density d of the original channel, it can be no greater than d+ 6k. 


Phase 3: Route wires between blocks. 


This phase routes the correct number of wires between different blocks: if z nets have one 
terminal in the top track of block A and the second terminal in the bottom track of block 
B, then route z wires from the top track of block A to the bottom track of block B. It is 
not necessary that the wires be routed into the correct columns, but only that the correct 
number are routed between blocks. This phase is relatively complicated and forms the core 
of the overall strategy. At most d-+ 3k horizontal tracks are used. Details are descibed 


later in this section. 


Phase 4: Route wires within each block: 


At the end of Phase 3, all that remains is the problem of routing within each block. Each 
block has at most & nets and at least three empty columns. The location of each net is 


determined in Phases 2 and 3. Each net may be routed entirely within its block using, 


AN APPROXIMATION ALGORITHM FOR TOP-TO-BOTTOM NETS 85 


for example, the algorithm of Kawamoto and Kajilani [36], which uses no more than 
3k horizontal tracks. Morcover, every block can be simultancously routed on the same 


horizontal tracks, so that this phase uses at most 3k tracks. 


Specifically, the nets are routed one per track: the order of routing is determined by 
constraints caused by a top terminal for one net lying above a bottom terminal of another 
net. When a cycle of constraints occurs, one net of the involved cycle is temporarily routed 
into an empty column to climinate one constraint, and routed to its other terminal after 
the other nets in the cycle have been routed. Two tracks are used to route the last net in 


each such cycle of constraints. fj 


Next, we present the detailed routing of Phase 3. [ach net is first classified into one of 
three categories. If both terminals of a net lie in the same block then the net is said to be a 
vertical net. Otherwise, if the terminals are in different, hloeks and if the ton terminal is ta the 
left of the bottom terminal, then the net is called a falling net. Finally, if the terminals are in 
different blocks and if the top terminal is to the right of the bottom terminal, then the net is 
called a rising net. 

The interblock routing procedure performs a left to right scan acress the channel, routing 
each block completely before procceding to the next block. Between any two consecutive blocks, 
the rising nets run along the upper horizontal tracks, the falling nets run along the lower tracks, 
and every empty horizontal track lies between the tracks containing the rising and falling nets. 

In some cases a wire must be routed through previously routed blocks on the left before 
it can proceed to the right. This requires that space be maintained for wires to backtrack 
(pun intended) when necessary. By keeping the empty tracks between the rising and falling 
nets within each block, we can coalesce the empty tracks in consecutive blocks to form the 
pyramid shown in Figure 7.3. Pyramids are crucial to backtracking; as an example, Figure 7.3 
illustrates how a “blocked” wire can backtrack through the pyramid on its way right. After a 
wire backtracks through the pyramid, the pyramid is updated as shown. 

The following outline describes the interblock routing procedure in detail. Each of the 


steps is illustrated in Figure 7.4. Figure 7.4a shows the initial situation just before a new 
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Figure 7.3: Maintaining a pyramid for backtracking. 


block is entered. The arrows on the tracks indicate whether the net is a rising/falling net that 
terminates within the block, or whether the net terminates in a different block on the right. 
The empty tracks are contained within the pyramid shown. In the case when the block to be 
routed is the leftmost block, the pyramid contains all horizontal tracks and extends to the left _ 


of the channel. 


The Interblock Routing Procedure 


Step 1: Ending nets. 


Nets with one terminal in a block on the left and the other in the current block are called 
ending nets. By moving the lowest ending rising net upward and the highest ending falling 
net downward wherever possible, the ending nets can be routed in a staircase pattern as 


shown in Figure 7.4b. 


Step 2: Continuing nets. 


Nets with one terminal in a block on the left and the other terminal in a block to the right 


of the current block are called continuing nets. Route the rising /falling) continuing nets 
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through the block by shifting them up to higher (lower) tracks in a staircase pattern that 


fits the staircase pattern of the ending nets. 


As shown in Vigure 7.4c, the staircase pattern of the continuing nets blocks one grid point 
in the top track as well as in the bottom track (unless the block has no ending nets). In 
other words, no net can begin at the grid points shown. However, remember that Phase 
2 provides at least 3 empty grid points on either track in each block. Since we are free 
to place these empty grid points in any position, we still have at least two empty points 


remaining on cither track. 


Step 3: Balancing. 


Suppose the number of ending rising nets is greater than the number of ending falling 
nets. Balance the difference by routing some starting rising nets (Whose which originate in 
the block) as shown in Figure 7.4d. In case there are more ending falling nets than ending 


rising nets, follow a symmetrically opposite procedure. 


In order to ensure that every empty column remains between the rising and falling nets it 
may be necessary to force one more empty grid point on the botlom track. Similarly, one 
grid point in the top track is forced to be empty because it is blocked by the rightmost 
starting rising net. At the end of this step, observe that the pyramid may be updated as 


shown in Figure 7.4e. 


Step 4: Starting nets. 


Suppose again that the number of ending rising nets is greater than the number of ending 
falling nets. After balancing the columns in Step 3, route all the starting falling nets as 
shown in Figure 7.4f. Observe that one more grid point on the bottom track is blocked, 


and therefore must be empty. Follow a symmetric procedure in the opposite case. 


Step 5: Remaining nets. 


At this stage cither starting rising nets or starting falling nets remain to be wired. Suppose 


that some starting rising nets remnain. Route these nets as shown in Figure 7.4g, making 
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use of the pyramid to backtrack whenever necessary. In case the number of remaining 
starting nets equals the number of starting falling nets routed in Step 4, then route the 


last starting rising net using the empty column from Step 3. 


Step 6: Vertical nets. 


Route the vertical nets in the natural way as shown in Figure 7.4h. Note that no extra 


empty points are required. J 


Figure 7.4h shows the complete routing for the block, as well as the updated pyramid 
structure. Observe that the initial conditions are satisfied for routing the next block on the 
right. Furthermore, note that, no more than 3 points on any track are required to be empty, so 
that Phase 2 of the main algorithm distributes sufficiently many empty grid points throughout 
the channel. 

Since every ending net is routed before every starting net, the total number of horizontal 
tracks used is no greater than d+ 6k, the density of the resulting channel at the end of Phase 
2. Consequently, the number of horizontal tracks used by the main algorithm is at most 


d+15k =d+O(f). 


7.3. Running Time Analysis 


To analyze the running time of the algorithm we shall calculate the running time of each 
phase separately. Suppose that a channel has c columns, density d, and flux f. Then, as shown 
earlier, Q(c(d + f)) is a lower bound on the minimum area needed to wire the channel. As 
shown below, this is also an upper bound on the running time of the algorithm. 

The first phase computes the smallest integer & for which the channel can be divided into 
groups of k? columns each such that every group has at least 3k empty grid points in both 
the top and bottom tracks. The value of k is computed by successively trying every integer 
(starting with 1,2,...) until the condition is satisfied. or any possible value 7, the size of each 
group is 7? and there are c/2?- groups in all. The required condition can casily be checked for 


each group in time O(7?) so that the total time is O(c). The total time for Phase | is therefore 
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no more than O(ck). But, since k < 6(f +1), this is no preater than O(cf). 

In the second phase, empty columns are evenly distributed among the different blocks 
within each group. ach wire runs along one horizontal track so that the time is no more than 
the total length of wire laid out. Since no more than 3k tracks are used, the total wire length 
does not exceed O(ck) = O(cf). 

Phase 3 is slightly more complicated to analyze. As long as wires do not change direction, 
the time to lay them out is never more than the length of wire laid. However, whenever a 
wire must turn a corner or backtrack, the time requirements can potentially increase. A priori, 
it seems that maintaining the pyramid structure is time consuming; furthermore, the time to 
update the pyramid each time can be significantly large. 

Fortunately, however, the pyramid is only an aid in understanding why the algorithm works 
correctly; there is no need to explicitly maintain the pyramid at all. Any time a wire must 
backtrack, all we really have to do is to simultancously backtrack along the uppermost and 
lowermost empty tracks until a column, which is empty between the two tracks, is encountered. 
In fact, following this procedure gives the same routing as with the pyramid. It is relatively 
straightforward to argue that, with the modified strategy, the total time spent in Phase 3 is 
no more than O(c(d + k)) = O(c(d + f)). 

Finally, Phase 4 requires no more than O(cf) time. Each channel routing subproblem of 
size k can be routed in time O(k) using O(k) tracks. The total time over all subproblems is 
therefore O(ck) = O(cf). 

Summing up, we conclude that the running time of the algorithm is dominated by Phase 


3, and does not exceed O(e(d + f)), which is linear in the area of the minimum area routing. 


7.4. The Channel Routing Algorithm 


The algorithm of Section 7.3 routed two-point nets which had one terminal in the top 
track and the other in the bottom track. This section extends the algorithm to multi-point 
nets. As before, the algorithm is divided into four phases. Once again, we assume that the 


channel has no trivial two-point nets, and has densily d and flux f. 
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The General Channel Routing Algorithm 


Phase 1: Partition the channel into groups. 


Find the least integer & for which the channel can be partitioned into groups of k? 
consecutive columns, such that a horizontal cut of size k? which isolates cither the top or 
bottom track of any group splits at most k? — 3k nets. The value of k may be found by 


trying successive values (starting with 1,2,...) until the required condition is satisfied. 


As before, it may be verified that the value of & is bounded by O(f), where f is the flux 


of the channel. 


Phase 2: Distribute empty points uniformly. 


For cach track within a group count the number p of empty points. If p > 3k, then 
distribute the empty points as before. If p < 3k then there are at least 3k — p duplicate 
terminals within the group and on the same track. Choose any 3k—p duplicated terminals 
and connect these to other terminals from the same net using one horizontal track for each 


such net. 


Next, pick one representative terminal for each duplicated net connected above. The 
duplicate terminals, being already connected, may be ignored so that each group now has 
at least 3k empty points on either track. Distribute these empty points uniformly as before 
so that each block of size & has at least 3 empty points. Observe that the total number of 


horizontal tracks used is O(k) = O(f). 


Phase 3: Route wires between blocks. 


Although the basic strategy is the same as before, the major difference is that a net 
may have representative terminals in many different blocks. (Within a block choose any 
one representative terminal, if it exists, on cach track.) The modified interblock routing 


procedure is described later in this section, and uses no more than 2d + O(f) tracks. 
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Phase 4: Route wires within each block. 


This phase remains essentially unchanged. The only difference is that within cach block 
the representative terminal of any net should be connected to all its duplicates. Although 
the choice of representatives determines the number of horizontal tracks used, this never 


exceeds O(f). If 


Next, we present the detailed interblock routing of Phase 3. Each net is first classified into 
one of four categorics. A net whose lefLmost terminal on the top track lies in the same block as 
its leftmost terminal on the bottom track is called a vertical net. If the leftmost top terminal 
(i.e., on the top track) of a net falls in a block to the left of the block containing the leftmost 
bottom terminal (i.e., on the bottom track) of the net then the net is said to be a falling net. 
Conversely, if the block containing the leftmost top terminal of a net is to the right of the 
block containing the leftmast hottom terminal of the net then the net is called a rising net. 
Finally, if all terminals of a net lie on the same track (either top or bottom) then the net is 
called a same-side net. 

In addition, each net is divided into a rising portion and a falling portion. The rising 
portion of a net links the block containing the leftmost terminal to the blocks containing 
terminals in the top track of the channel. The falling portion of a net links the block containing 
the leftmost terminal to the blocks containing terminals in the bottom track of the channel. 
The interblock routing procedure connects the top terminals with the bottom terminals using 
a single connection emerging from the block containing the leftmost terminal. Figure 7.5 
illustrates the rising and falling portions of a net and where the connection is made. Observe 
that not every net is required to have both a rising as well as a falling portion. 

As before, the procedure ensures that between consecutive blocks tracks containing rising 
portions of nets are above every empty track and that every empty track is above the tracks 
containing falling portions of nets. This allows us to once again maintain a pyramid structure 
for backtracking. 

The routing proceeds block-by-block from left to right in the middle 2d + O(f) tracks of 


the channel. Each block is routed in seven steps described below. The steps are numbered to 
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Figure 7.5: Dividing nets into rising and falling portions. 
Some nets may have only a falling/rising por- 
tion. 


coincide with the algorithm of Section 7.3. Figure 7.6 shows a complete routing of a block. 


The Interblock Routing Procedure 


Step 1: Ending nets. 


Route the ending nets (those which do not have a terminal to the right of the current 
block) in staircase patterns at the left end of the block. 
Step 2: Continuing nets. 


Route the continuing nets (those with a terminal in a block to the right of the current 
block) in staircase patterns nestled against those generated in Step 1. If a continuing net 
also has a representative terminal in the current block, then place the terminal to the right 


of the staircase and make a connection as shown in Figure 7.6. 


Step 2.5: Starting same-side nets. 


Route every same-side net whose leftmost terminal lies in the current block in a staircase 
fashion, bringing wires from the bottom (top) track to the lowest (highest) available empty 


track. 


Step $3: Balancing. 


If more columns have been used at the top of the channel than at the bottom, make up 
the difference by routing the rising portions of some starting rising nets. If the opposite 


case holds, follow the symmetric procedure. 
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Figure 7.6: Complete Phase 3 routing within a block. 


Step 4: Starting nets. 


Route the falling portions of starting falling nets (or the rising portions of starting rising 


nets depending on which was in excess in Step 3). 


Step 5: Remaining nets. 


Route the remaining rising portions of starting rising nets (or the falling portions of remain- 
ing starting falling nets), using the pyramid for backtracking if necessary. Furthermore, 
route the falling portions of starting rising nets and the rising portions of starting falling 


nets in the straightforward way using empty tracks. 


Step 6: Vertical nets. 


Route the vertical nets in empty columns as before. 
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Since the rising and falling portions of each net are effectively separated, the interblock 
routing procedure requires no more than 2d + O(f) horizontal tracks. As before, it can be 
argued that the overall algorithm runs in linear time, and routes a channel of density d and 


flux f in width 2d + O(f). To summarize, we have shown the following. 


Theorem 7.2. Every multi-point net channel with density d and fluz f can be routed in width 


no greater than 2d + O(f) in linear time. 


Furthermore, if every net is a same-side net or only has a rising portion or a falling portion 
(but not both) then the number of tracks used is d+ O(f). In particular, for two-point net 


channels we have the following result. 


Theorem 7.3. Every two-point net channel with density d and fluz f can be routed in width 


d+ O(f) in linear time. 


CHAPTER 8 


Conclusions, Extensions and Open Problems 


This thesis was motivated by the need for a clearer understanding of various issues in 
circuit layout. The techniques developed provide new insights and approaches for VLSI layout. 
Although the results in their present form are theoretical in nature, it is likely that some of 


the techniques can be adapted for use in practice. 


The two parts of the thesis share a common underlying methodology. First, the critical 
properties that determine the quality of a layout are identified. In the next step, these properties 
are effectively exploited to obtain good layouts. Thus, for example, the minimum bifurcator 
of a graph gives a lower bound on layout area, and good layouts can be found quickly if a 
decomposition is available. Similarly, flux and density give lower bounds on channel width; 


they also provide the basis for a fast, provably good channel routing algorithm. 


The strategy for VLSI graph layout in Part I provides a simple and uniform technique for 
solving a variety of layout problems efficiently. The unified framework is suitable for custom 
layout, and at the same time is efficient with regard to area, delay, and fault-tolerance. The 
tree of meshes, in particular, emerges as a surprisingly versatile and powerful network for 
circuit layout. A priori, there is no reason to believe that such diverse concerns can be handled 


simultaneously in a compatible manner, let alone within a common framework. 


Approaching the channel routing problem from a theoretical viewpoint, Part II charac- 
terizes the properties that make Manhattan routing diificult. ‘hese properties then form the 
basis of a new, lincar-time approximation algorithm that is guaranteed to always find a near- 
optimal routing. In contrast, although the problem had been studied intensively for over a 


decade from an engineering viewpoint, all previous heuristics could be made to perform ar- 
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bitrarily poorly on certain inputs. 
These results notwithstanding, a number of problems are left unresolved in this thesis. 
The following sections mention some of the more important open problems, and also sketch 


extensions to the results reported. More details on specific problems may be found in [7]. 


8.1. Problems in Graph Layout 


The divide-and-conquer strategy based on graph bifurcators has also been successfully ap- 
plied by Leighton and Rosenberg [46] to the study of three-dimensional VLSI circuit layout. 
In addition, the techniques and results are also applicable to graph and data-structure embed- 


dings, and also provide bounds on one- and two- dimensional bandwidth minimization. 


Question 1. How much area is required to lay out an N-node planar graph? The best 
universal upper bound is O(N lg? N) (49, 83] while the best existential lower bound (for 
the tree of meshes) is Q(N Ig NV) [40, 41]. 


Question 2. Is there a polynomial time algorithm for laying out trees with edges not much 
longer than the minimax edge length? The best tree layout algorithm (Chapter 3) produces 
layouts with edges of length @(/N/lg N). Although this is optimal for some trees, it is 


way off for others. 


Question 3. Is there a better way to realize a network in an environment that contains 
defective processors? The results of Chapter 5 guarantee that any graph can be realized 
using the good processors provided the “channels” have width O(& Ig ) in a regular 
layout. Although this bound is optimal for some networks [7], it is not known to be 


optimal for simpler networks such as two-dimensional arrays. 


Question 4. Is there a provably good heuristic for graph bisection? Any such heuristic 
could be used to find efficient decomposition trees and bifurcators, which, in turn, could 
be used to produce good layouts |7, 42]. There are many heuristics which do very well in 
practice [13, 17, 24, 37, 66, 71]. Analyzing these or developing new heuristics along similar 
lines is likely to have an impact on VLSI layout. 
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Question 5. Can the framework be extended to deal with processors of variable size and 
shape? While it is relatively easy to deal with equal-size processors, any progress toward 


the general problem would be very interesting. 


8.2. Problems in Channel Routing 


While the algorithms of Chapter 7 are fast and are guaranteed to produce near-optimal 
routings, the analysis of the constant factors leaves much room for improvement. In particular, 
the actual number of tracks used by the algorithm may be much less than the upper bounds 
indicate. 

For example, if the empty grid points are already uniformly distributed to begin with, 
then Phase 2 needs to perform only a minor redistribution of empty points. Consequently, the 
upper bound of 6k < 36(f +1) tracks to redistribute empty points, is a gross overestimate. On 
the other hand, if the empty points are not uniformly distributed, but are bunched together in 
groups, then the actual lower bound is underestimated by flux. To see this, observe that along 
a horizontal track at most two wires can turn into a blank column inside a bunch of empty 
columns. However, the lower bound argument for flux does not take the density/frequency of 
blank points into consideration. Since flux underestimates the true bound in this case, once 
again, we see that the performance of the algorithm is much better in relation to the actual 
value than what the bounds indicate. 

In addition, it is possible to obtain tighter bounds more directly, by redefining the notion of 
flux. Rather than making horizontal cuts in the channel, it is better to employ the argument to 
“windows,” i.e., groups of contiguous columns. This is the idea adopted by Brown and Rivest 
in their lower bound arguments. The advantage of this lower bound strategy is that if many 
wires are forced to change columns within the window, then the lower bound is very high. On 
the other hand, if many wires exit across the sides of the window, then the width must again 
be large since at most two wires can exit the window along a horizontal track. Is it possible 
to redefine the notion of flux to capture some of these bounds? What is the best definition for 


flux? Finally, do multi-point nets really require 2d + O(f) tracks, or will d + O(f) suffice? 
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At a more general level, it would be interesting to investigate the applicability of flux to 
other wiring problems, such as, for example, the switchboz problem. In conclusion, we mention 
that Baker, Bhatt, and Leighton [3] extend the results of the Mashattan wiring model to the 
case where selene cuts are larger than wires. In this case it tums out that flux is never more - 
than a coustant: so that censity 1 is the sole limiting factor. on channel width. 
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